From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_ADSP_CUSTOM_MED,
	DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FROM,HTML_MESSAGE,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.4
Received: (qmail 18570 invoked from network); 5 Jun 2020 16:02:18 -0000
Received: from minnie.tuhs.org (45.79.103.53)
  by inbox.vuxu.org with ESMTPUTF8; 5 Jun 2020 16:02:18 -0000
Received: by minnie.tuhs.org (Postfix, from userid 112)
	id B93CD9CAEB; Sat,  6 Jun 2020 02:02:13 +1000 (AEST)
Received: from minnie.tuhs.org (localhost [127.0.0.1])
	by minnie.tuhs.org (Postfix) with ESMTP id 76D769C774;
	Sat,  6 Jun 2020 02:01:34 +1000 (AEST)
Authentication-Results: minnie.tuhs.org;
	dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="mzNMg3jh";
	dkim-atps=neutral
Received: by minnie.tuhs.org (Postfix, from userid 112)
 id 834629C774; Sat,  6 Jun 2020 02:01:30 +1000 (AEST)
Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com
 [209.85.160.177])
 by minnie.tuhs.org (Postfix) with ESMTPS id E69AE9C606
 for <tuhs@tuhs.org>; Sat,  6 Jun 2020 02:01:29 +1000 (AEST)
Received: by mail-qt1-f177.google.com with SMTP id k22so8862748qtm.6
 for <tuhs@tuhs.org>; Fri, 05 Jun 2020 09:01:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=vlA5J5JkJ/GRIivQZL4BhWkWAcm4mtp8Jwuz5d09TNo=;
 b=mzNMg3jhLnc9Lbdn2RpDp9bDBG4J5KETILORzFOg+TDHYtoy4C9PTA24cPMiqOR3cA
 AFdzobdikBcf08+d0EEsl6bAh3s7RFc570pciZioIMyto/YNfVBSBlBkJVCZ9KEEC8MZ
 Jrq+Njcq4tZSIqj1HxQmn23je8dCQfzWcMx2OoDoRmN59x3gEh2HQxYUegGpKyej5xKk
 HTNmDi9GizohcyoOekUutLGF3BV3MWj7V5BFsUAP7mv96PoOC4qIDMG28T/2/LFAplge
 WcvRrpXLM1vubGAaJEk1nLE05O6ONABqb8WG+xsKrdzKjGvEGEf60kXc1wE/BRJybJhJ
 aGMw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=vlA5J5JkJ/GRIivQZL4BhWkWAcm4mtp8Jwuz5d09TNo=;
 b=pnVNOTgdc7s8AGG68NTdgBmfYmA6I6nYbegvkzSOj39j0tvsEgSBoPiPjLCUKTRvqq
 BeqQaLffsOSsZlzRGwFqqA/d5YAl74mums3nYnIsvbA1G3EgjZRvHKu/Yw01VXEsNkwY
 hR+hcxro6AudLkMMUAv9FsNtbDK99i7KIbrLIHwRx4DF20vEcd0P0C0vGrZDmCT+ak1q
 LItlvePK7ixu13XlaMysZldxgRkbUJERKMzqfbcwcG5bOate3z2kKPnDCl7gzl83VG03
 tgBlO+ZOmuxt5CMadqclLe1iiPFDexPxXEHg8SUT+tGN2OPz961ouLK+O1bqAwx1IAKc
 PygA==
X-Gm-Message-State: AOAM530i1rRNDArgUkWSHvSuvW6VglLqG1v8Gd3dHxbalzhtROq7MSKi
 sdl3vR7Uttw3N+D3FGsIJKsPZ3eVQI7cMNWTSRaSjA==
X-Google-Smtp-Source: ABdhPJzg3crZ0bamrRgdolbvG+O+moYmmb5h9b81IUnNjdzzSHEy25QPuBebGAIMyrt6CIV8nIaITVTwHKo4KqzF6Dk=
X-Received: by 2002:ac8:37ad:: with SMTP id d42mr10763496qtc.352.1591372888915; 
 Fri, 05 Jun 2020 09:01:28 -0700 (PDT)
MIME-Version: 1.0
References: <ADC32296-73FD-46D4-A3D6-4EA03A956103@planet.nl>
 <CAC20D2McV_i0d=m33McRVoPda8ZVLawaKJxndYDLbqAnHLE_Wg@mail.gmail.com>
 <CANCZdfrefhG42UEx=WXrVNvrxR0oBpT1u7oz6_PitjZDAbSALw@mail.gmail.com>
 <CAKzdPgyP0Vtkwh50hQQpFuFaztw9i+4fi2BVAzPhm8upJh4e_Q@mail.gmail.com>
 <alpine.BSF.2.21.9999.2006011328160.31987@aneurin.horsfall.org>
 <20200601145801.GE22016@mcvoy.com> <20200604090436.GJ279@server.rulingia.com>
 <CANCZdfo289SQhA3tu5Z=RsDv3VrxgZ00r3M7mPNPCMEQiMvuQA@mail.gmail.com>
 <20200604165011.GC18437@mcvoy.com>
In-Reply-To: <20200604165011.GC18437@mcvoy.com>
From: Dan Cross <crossd@gmail.com>
Date: Fri, 5 Jun 2020 12:00:52 -0400
Message-ID: <CAEoi9W5svk9oATzkHQNrPOcFii6p7vbd+rYMFBopiqnvrXzMkQ@mail.gmail.com>
To: Larry McVoy <lm@mcvoy.com>
Content-Type: multipart/alternative; boundary="000000000000bfc3f605a7585fbb"
Subject: Re: [TUHS] non-blocking IO
X-BeenThere: tuhs@minnie.tuhs.org
X-Mailman-Version: 2.1.26
Precedence: list
List-Id: The Unix Heritage Society mailing list <tuhs.minnie.tuhs.org>
List-Unsubscribe: <https://minnie.tuhs.org/cgi-bin/mailman/options/tuhs>,
 <mailto:tuhs-request@minnie.tuhs.org?subject=unsubscribe>
List-Archive: <http://minnie.tuhs.org/pipermail/tuhs/>
List-Post: <mailto:tuhs@minnie.tuhs.org>
List-Help: <mailto:tuhs-request@minnie.tuhs.org?subject=help>
List-Subscribe: <https://minnie.tuhs.org/cgi-bin/mailman/listinfo/tuhs>,
 <mailto:tuhs-request@minnie.tuhs.org?subject=subscribe>
Cc: The Eunuchs Hysterical Society <tuhs@tuhs.org>
Errors-To: tuhs-bounces@minnie.tuhs.org
Sender: "TUHS" <tuhs-bounces@minnie.tuhs.org>

--000000000000bfc3f605a7585fbb
Content-Type: text/plain; charset="UTF-8"

On Thu, Jun 4, 2020 at 12:51 PM Larry McVoy <lm@mcvoy.com> wrote:

> On Thu, Jun 04, 2020 at 08:19:58AM -0600, Warner Losh wrote:
> > The kicker is that all of the kernel is callback driven. The
> > upper half queues the request and then sleeps until the lower half
> signals
> > it to wakeup. And that signal is often just a wakeup done from the
> > completion routine in the original request. All of that would be useful
> in
> > userland for high volume activity, none of it is exposed...
>
> Yeah, I've often wondered why this stuff wasn't exposed.  We already have
> signal handlers, seems like that maps.
>

Was it Rob who said that signals were really just for SIGKILL? Here,
signals would be gang-pressed into service as a general IPC mechanism. In
fairness, they've mutated that way, but they didn't start out that way.
While I obviously wasn't there, the strong impression I get is that by the
time people were seriously thinking about async IO in Unix, the die
had already been cast for better or worse.


> I tried to get the NFS guys at Sun to rethink the biod junk and do it like
> UFS does, where it queues something and gets a callback.  I strongly
> suspect
> that two processes, one to queue, one to handle callbacks, would be more
> efficient and actually faster than the biod nonsense.
>
> That's one of the arguments I lost unfortunately.
>
> Warner, exposing that stuff in FreeBSD is not really that hard, I suspect.
> Might be a fun project for a young kernel hacker with some old dude like
> you or me or someone, watching over it and thinking about the API.
>

I'm going to actually disagree with you here, Larry. While I think a basic
mechanism wouldn't be THAT hard to implement, it wouldn't compose nicely
with the existing primitives. I suspect the edge cases would be really
thorny, particularly without a real AST abstraction. For instance, what
happens if you initiate an async IO operation, then block on a `read`?
Where does the callback happen? If on the same thread, The real challenge
isn't providing the operation, it's integrating it into the existing model.

As a counter-point to the idea that it's completely unruly, in Akaros this
was solved in the C library: all IO operations were fundamentally
asynchronous, but the C library provided blocking read(), write(), etc by
building those from the async primitives. It worked well, but Akaros had
something akin to an AST environment and fine-grain scheduling decisions
were made in userspace: in Akaros the unit of processor allocation is a CPU
core, not a thread, and support exists for determining the status of all
cores allocated to a process. There are edge cases (you can't roll-your-own
mutex, for example, and the basic threading library does a lot of heavy
lifting for you making it challenging to integrate into the runtime of a
language that doesn't use the same ABI), but by and large it worked. It was
also provided by a kernel that was a pretty radical departure from a
Unix-like kernel.

        - Dan C.

--000000000000bfc3f605a7585fbb
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr">On Thu, Jun 4, 2020 at 12:51 PM Larry McV=
oy &lt;<a href=3D"mailto:lm@mcvoy.com">lm@mcvoy.com</a>&gt; wrote:<br></div=
><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" style=3D"marg=
in:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1e=
x">On Thu, Jun 04, 2020 at 08:19:58AM -0600, Warner Losh wrote:<br>
&gt; The kicker is that all of the kernel is callback driven. The<br>
&gt; upper half queues the request and then sleeps until the lower half sig=
nals<br>
&gt; it to wakeup. And that signal is often just a wakeup done from the<br>
&gt; completion routine in the original request. All of that would be usefu=
l in<br>
&gt; userland for high volume activity, none of it is exposed...<br>
<br>
Yeah, I&#39;ve often wondered why this stuff wasn&#39;t exposed.=C2=A0 We a=
lready have<br>
signal handlers, seems like that maps.=C2=A0 <br></blockquote><div><br></di=
v><div>Was it Rob who said that signals were really just for SIGKILL? Here,=
 signals would be gang-pressed into service as a general IPC mechanism. In =
fairness, they&#39;ve mutated that way, but they didn&#39;t start out that =
way. While I obviously wasn&#39;t there, the strong impression I get is tha=
t by the time people were seriously thinking about async IO in Unix, the di=
e had=C2=A0already been cast for better or worse.</div><div>=C2=A0</div><bl=
ockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-lef=
t:1px solid rgb(204,204,204);padding-left:1ex">
I tried to get the NFS guys at Sun to rethink the biod junk and do it like<=
br>
UFS does, where it queues something and gets a callback.=C2=A0 I strongly s=
uspect<br>
that two processes, one to queue, one to handle callbacks, would be more<br=
>
efficient and actually faster than the biod nonsense.<br>
<br>
That&#39;s one of the arguments I lost unfortunately.<br>
<br>
Warner, exposing that stuff in FreeBSD is not really that hard, I suspect.<=
br>
Might be a fun project for a young kernel hacker with some old dude like<br=
>
you or me or someone, watching over it and thinking about the API.<br></blo=
ckquote><div><br></div><div>I&#39;m going to actually disagree with you her=
e, Larry. While I think a basic mechanism wouldn&#39;t be THAT hard to impl=
ement, it wouldn&#39;t compose nicely with the existing primitives. I suspe=
ct the edge cases would be really thorny, particularly without a real AST a=
bstraction. For instance, what happens if you initiate an async IO operatio=
n, then block on a `read`? Where does the callback happen? If on the same t=
hread, The real challenge isn&#39;t providing the operation, it&#39;s integ=
rating it into the existing model.</div><div><br></div><div>As a counter-po=
int to the idea that it&#39;s completely unruly, in Akaros this was solved =
in the C library: all IO operations were fundamentally asynchronous, but th=
e C library provided blocking read(), write(), etc by building those from t=
he async primitives. It worked well, but Akaros had something akin to an AS=
T environment and fine-grain scheduling decisions were made in userspace: i=
n Akaros the unit of processor allocation is a CPU core, not a thread, and =
support exists for determining the status of all cores allocated to a proce=
ss. There are edge cases (you can&#39;t roll-your-own mutex, for example, a=
nd the basic threading library does a lot of heavy lifting for you making i=
t challenging to integrate into the runtime of a language that doesn&#39;t =
use the same ABI), but by and large it worked. It was also provided by a ke=
rnel that was a pretty radical departure from a Unix-like kernel.</div><div=
><br></div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 - Dan C.</div><div><br></div></=
div></div>

--000000000000bfc3f605a7585fbb--