From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, HTML_MESSAGE,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 27430 invoked from network); 4 Jun 2020 14:20:51 -0000 Received: from minnie.tuhs.org (45.79.103.53) by inbox.vuxu.org with ESMTPUTF8; 4 Jun 2020 14:20:51 -0000 Received: by minnie.tuhs.org (Postfix, from userid 112) id 3BD419CA5A; Fri, 5 Jun 2020 00:20:47 +1000 (AEST) Received: from minnie.tuhs.org (localhost [127.0.0.1]) by minnie.tuhs.org (Postfix) with ESMTP id A5EA79C600; Fri, 5 Jun 2020 00:20:15 +1000 (AEST) Authentication-Results: minnie.tuhs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=bsdimp-com.20150623.gappssmtp.com header.i=@bsdimp-com.20150623.gappssmtp.com header.b="HcUEFjYi"; dkim-atps=neutral Received: by minnie.tuhs.org (Postfix, from userid 112) id B36049C600; Fri, 5 Jun 2020 00:20:11 +1000 (AEST) Received: from mail-qk1-f193.google.com (mail-qk1-f193.google.com [209.85.222.193]) by minnie.tuhs.org (Postfix) with ESMTPS id EA6B19C1C8 for ; Fri, 5 Jun 2020 00:20:10 +1000 (AEST) Received: by mail-qk1-f193.google.com with SMTP id g28so6162227qkl.0 for ; Thu, 04 Jun 2020 07:20:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=THU+6X6RbY8cUSlaV48hWe/B1kFF9flh8CPJgK+mvzU=; b=HcUEFjYiNAJdjRwOdIX3vC3OxSF+kbVSvSRgOHV5TsTxfG78StFfFwyByMGRaUMUVn hSXF1VgMN+8hlaHsMOSghOPq2uC+3PL+fd5EKhB59reGQLo6jj5PVdZxSv8/IMsz/1nW s0b1HWmUahpyEdMZfbX4V6TLFjvmXeaQxQ+DEuqsPQeCUCRUCXZWcFufNqif+VQd3PRf +jNyAiPFZowrNxhd6mncEGqiaLT2cIU7oPPMU+g4vbfkSQhqyaAvYmFC/7jFhTDEtlY9 +53d4I8bs2LxKXZB5zdgejqbdKzDnOiQfaN+35GO3CnqAwH5L1nk3t8g7ojh/rdm87LG WkLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=THU+6X6RbY8cUSlaV48hWe/B1kFF9flh8CPJgK+mvzU=; b=b3IZ/CN+4RgKkI3DJvBPJQIx8/zfDM9cf9KC8NPYem9qlxMtFogROgkhu8BGISdCZu 7AEh+WxjbfcrzRbKPxuuxLm6yj6BItwjmq3Aldmwe5aJDINaJvZaVrX2/7sD/a4VHu+r hpxqzx7b/MLuGwA9c5yTeyZD7P3Oj+upxpWD/uoJurGcXRC4kR2AeJILoSmTrHHkLVxx tVdUHJG/wifTiJPK+XUF2qyLEG+dehndR0pflhbrsLPf+/BVg13SIEghX49yycoCimo5 rBXEOlYDWiI/wc9AHs0NrYND0nVOo3Zejm6T3vIn+TGh1mA6Iuiec8e6v1VzHU8lQ5iv 8o8w== X-Gm-Message-State: AOAM533ydrsoLeQL8yZBN5r5DXaFRcpBg7Zj8gM6RJiPMGdA4frOjMN5 rcBrtgujHImWYREgjGjxdGl325EoItqRL+2GyjkpUA== X-Google-Smtp-Source: ABdhPJxCSuUTHbPWhvDF6TczCSzlr+GgNlRwZa2pytfENaiSZJFvFCAOb1QCyt9rowKt4Vj8gSoRyl4pYr5j2phdE90= X-Received: by 2002:a37:99c6:: with SMTP id b189mr4500155qke.240.1591280410001; Thu, 04 Jun 2020 07:20:10 -0700 (PDT) MIME-Version: 1.0 References: <20200601145801.GE22016@mcvoy.com> <20200604090436.GJ279@server.rulingia.com> In-Reply-To: <20200604090436.GJ279@server.rulingia.com> From: Warner Losh Date: Thu, 4 Jun 2020 08:19:58 -0600 Message-ID: To: Peter Jeremy Content-Type: multipart/alternative; boundary="00000000000093923305a742d7c7" Subject: Re: [TUHS] non-blocking IO X-BeenThere: tuhs@minnie.tuhs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: The Unix Heritage Society mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: The Eunuchs Hysterical Society Errors-To: tuhs-bounces@minnie.tuhs.org Sender: "TUHS" --00000000000093923305a742d7c7 Content-Type: text/plain; charset="UTF-8" On Thu, Jun 4, 2020 at 3:06 AM Peter Jeremy wrote: > On 2020-Jun-01 07:58:02 -0700, Larry McVoy wrote: > >On Mon, Jun 01, 2020 at 01:32:56PM +1000, Dave Horsfall wrote: > >> On Mon, 1 Jun 2020, Rob Pike wrote: > >> > >> > I???m not quite sure why the Research lineage did not include > >> > non-blocking behaviour, especially in view of the man page comments. > >> > Maybe it was seen as against the Unix philosophy, with select() > >> > offering sufficient mechanism to avoid blocking (with open() the hard > >> > corner case)? > >> > > >> >That's it. Select was good enough for our purposes. > >> > >> After being dragged through both Berserkley and SysVile, I never did > get the > >> hang of poll()/select() etc,,, > > > >I'm sure you could, select is super handy, think a network server like > >apache. > > My view may be unpopular but I've always been disappointed that Unix > implemented blocking I/O only and then had to add various hacks to cover > up for the lack of asynchonous I/O. It's trivial to build blocking I/O > operations on top of asynchonous I/O operations. It's impossible to do > the opposite without additional functionality. > > I also found it disappointing that poll()/select() only worked on TTY and > network operations. HDDs are really slow compared to CPUs and it would be > really nice if a process could go and do something else whilst waiting for > a file to open. > Lest anybody think this is a theoretical concern, Netflix has spent quite a bit of effort to reduce the sources of latency in our system. The latency for open doesn't happen often, due to caching, but when it does this causes a hickup for nginx worker thread (since open is blocking). If you get enough hickups, you wind up consuming all your worker threads and latency for everybody suffers while waiting for these to complete (think flaky disk that suddenly takes a really long time for each of its I/Os for one example). We've hacked FreeBSD in various ways to reduce or eliminate this delay.... In FreeBSD we have to translate from the pathname to a vnode, and to do that we have to look up directories, indirect block tables, etc. All these are surprising places that one could bottleneck at... And if you try to access the vnode before all this is done, you'll wait for it (though in the case of sendfile it doesn't matter since that's async and only affect the one I/O)... So I could see how having a async open could introduce a lot of hair into the mix depending on how you do it. Without a robust callback/AST mechanism, my brain is recoiling from the EALREADY errors in sockets for things that are already in progress... reads and write are easy by comparison :) The kicker is that all of the kernel is callback driven. The upper half queues the request and then sleeps until the lower half signals it to wakeup. And that signal is often just a wakeup done from the completion routine in the original request. All of that would be useful in userland for high volume activity, none of it is exposed... Warner --00000000000093923305a742d7c7 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Thu, Jun 4, 2020 at 3:06 AM Peter = Jeremy <peter@rulingia.com>= wrote:
On 2020-= Jun-01 07:58:02 -0700, Larry McVoy <lm@mcvoy.com> wrote:
>On Mon, Jun 01, 2020 at 01:32:56PM +1000, Dave Horsfall wrote:
>> On Mon, 1 Jun 2020, Rob Pike wrote:
>>
>> > I???m not quite sure why the Research lineage did not include=
>> > non-blocking behaviour, especially in view of the man page co= mments.
>> > Maybe it was seen as against the Unix philosophy, with select= ()
>> > offering sufficient mechanism to avoid blocking (with open() = the hard
>> > corner case)?
>> >
>> >That's it. Select was good enough for our purposes.
>>
>> After being dragged through both Berserkley and SysVile, I never d= id get the
>> hang of poll()/select() etc,,,
>
>I'm sure you could, select is super handy, think a network server l= ike
>apache.

My view may be unpopular but I've always been disappointed that Unix implemented blocking I/O only and then had to add various hacks to cover up for the lack of asynchonous I/O.=C2=A0 It's trivial to build blockin= g I/O
operations on top of asynchonous I/O operations.=C2=A0 It's impossible = to do
the opposite without additional functionality.

I also found it disappointing that poll()/select() only worked on TTY and network operations.=C2=A0 HDDs are really slow compared to CPUs and it woul= d be
really nice if a process could go and do something else whilst waiting for<= br> a file to open.

Lest anybody think this= is a theoretical concern, Netflix has spent quite a bit of effort to reduc= e the sources of latency in our system. The latency for open doesn't ha= ppen often, due to caching, but when it does this causes a hickup=C2=A0for = nginx worker thread (since open is blocking). If you get enough hickups, yo= u wind up consuming all your worker threads and latency for everybody suffe= rs while waiting for these to complete (think flaky disk that suddenly take= s a really long time for each of its I/Os for one example). We've hacke= d FreeBSD in various ways to reduce or eliminate this delay.... In FreeBSD = we have to translate from the pathname to a vnode, and to do that we have t= o look up directories, indirect block tables, etc. All these are surprising= places that one could bottleneck at... And if you try to access the vnode = before all this is done, you'll wait for it (though in the case of send= file it doesn't matter since that's async and only affect the one I= /O)...

So I could see how having a async open coul= d introduce a lot of hair into the mix depending on how you do it. Without = a robust callback/AST mechanism, my brain is recoiling from the EALREADY er= rors in sockets for things that are already in progress... reads and write = are easy by comparison :)=C2=A0 The kicker is that all of the kernel is cal= lback driven. The upper half queues the request and then sleeps until the l= ower half signals it to wakeup. And that signal is often just a wakeup done= from the completion routine in the original request. All of that would be = useful in userland for high volume activity, none of it is exposed...
=

Warner

--00000000000093923305a742d7c7--