From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=HTML_MESSAGE, MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 13065 invoked from network); 6 Feb 2021 02:11:08 -0000 Received: from minnie.tuhs.org (45.79.103.53) by inbox.vuxu.org with ESMTPUTF8; 6 Feb 2021 02:11:08 -0000 Received: by minnie.tuhs.org (Postfix, from userid 112) id 9815D9C798; Sat, 6 Feb 2021 12:11:07 +1000 (AEST) Received: from minnie.tuhs.org (localhost [127.0.0.1]) by minnie.tuhs.org (Postfix) with ESMTP id 676D69BA45; Sat, 6 Feb 2021 12:10:43 +1000 (AEST) Received: by minnie.tuhs.org (Postfix, from userid 112) id F37DB9BA45; Sat, 6 Feb 2021 12:10:41 +1000 (AEST) X-Greylist: delayed 1611 seconds by postgrey-1.36 at minnie.tuhs.org; Sat, 06 Feb 2021 12:10:39 AEST Received: from smtp4.via.net (smtp4.via.net [209.81.0.254]) by minnie.tuhs.org (Postfix) with ESMTPS id 435999BA3F for ; Sat, 6 Feb 2021 12:10:39 +1000 (AEST) Received: from mail.via.net (mail.via.net [157.22.3.34]) by smtp4.via.net (8.15.2/8.14.1-VIANET) with ESMTPS id 1161hf5m003161 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Fri, 5 Feb 2021 17:43:42 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.97.3 at smtp4.via.net Received: from [209.81.2.65] ([209.81.2.65]) by mail.via.net (8.15.2/8.14.1-VIANET) with ESMTP id 1161hf3Y014476; Fri, 5 Feb 2021 17:43:41 -0800 (PST) From: joe mcguckin Message-Id: Content-Type: multipart/alternative; boundary="Apple-Mail=_165EAB8B-E222-4119-97DE-1138AFBAD628" Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\)) Date: Fri, 5 Feb 2021 17:43:31 -0800 In-Reply-To: <31039.1612574314@hop.toad.com> To: John Gilmore References: <20210205003315.GK13701@mcvoy.com> <26D923CF-5319-4207-BA28-6EFA0E3BB1F8@iitbombay.org> <20210205141820.GO13701@mcvoy.com> <31039.1612574314@hop.toad.com> X-Mailer: Apple Mail (2.3608.120.23.2.4) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (smtp4.via.net [209.81.0.254]); Fri, 05 Feb 2021 17:43:43 -0800 (PST) Subject: Re: [TUHS] FreeBSD behind the times? (was: Favorite unix design principles?) X-BeenThere: tuhs@minnie.tuhs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: The Unix Heritage Society mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Bakul Shah , TUHS main list Errors-To: tuhs-bounces@minnie.tuhs.org Sender: "TUHS" --Apple-Mail=_165EAB8B-E222-4119-97DE-1138AFBAD628 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 When I was working for a big chip testing company, a STDIO vs MMAP = problem came up. The tester was at it=E2=80=99s heart a SPARC VME system running Solaris. = The tester read in =E2=80=98patterns=E2=80=99=20 from disk, it it literally took hours to read in all the test patterns. = At the scale of the large chip vendors, every minute you can=E2=80=99t test because the tester is booting, etc = means dollars are lost. We wrote a bunch of macros that replaced the STDIO file system I/O calls = with the equivalent=20 mmap calls. . It turns out STDIO does a lot of prefetching and has some = assumptions that you=E2=80=99re=20 going to read a file linearly from beginning to end, whereas we wanted = to jump around a lot in the pattern files. Pattern loading went from 4 hours to 30 minutes. Our customer was = ecstatic. Joe McGuckin ViaNet Communications joe@via.net 650-207-0372 cell 650-213-1302 office 650-969-2124 fax > On Feb 5, 2021, at 5:18 PM, John Gilmore wrote: >=20 > On Thu, Feb 04, 2021 at 09:17:54PM -0800, Bakul Shah wrote: >> Write(2)ing to a mapped page sounds pretty dodgy. Likely to get you >> in trouble in any case. Similarly read(2)ing.=20 >=20 > Uh, no. You misunderstand completely. >=20 > The purpose of the kernel is to provide a reliable interface to system > facilities, that lets processes NOT DEPEND on what other processes are > doing. >=20 > The decision about whether Tool X uses mmap() versus read() to access = a > file, or mmap() versus write() to change one, is a decision that DOES > NOT DEPEND on what Tool Y is doing. Tools X and Y may have been = written > by different groups in different decades. Tool X may have been = written > to use stdio, which used read(). Three years later, stdio got = rewritten > to use mmap() for speed, but that's invisible to the author of Tool X. > And maybe an end user in 2025 decides to use both Tool X and Tool Y on > the same file. So only much later will any malign interactions = between > read/write and mmap actually be noticed by end users. And the fix is > not to create new dependencies between Tool X, stdio, and Tool Y. It = is > to fix the kernel so they do not depend on each other! >=20 > Here is a real-life example from my own experience. >=20 > There is a long-standing bug in the Linux kernel, in which the = inotify() > system call simply didn't work on nested file systems. This caused a > long-standing bug in Ubuntu, which I reported in 2012 here: >=20 > https://bugs.launchpad.net/ubuntu/+source/rpcbind/+bug/977847 >=20 > The symptom was that after booting from a LiveCD image, "apt-get > install" for system services (in my case an NFS client package) = wouldn't > work. Turned out the system startup scripts used inotify() to notice > and start newly installed system services. The root cause was that > inotify failed because the root file system was an "overlayfs" that > overlaid a RAMdisk on top of the read-only LiveCD file system. The > people who implemented "overlayfs" didn't think inotify() was = important, > or they thought it would be too much work to make it actually meet its > specs, so they just made it ignore changes to the files in the = overlaid > file system. So the startup daemon's inotify() would never report the > creation of new files about the new services, because those files were > in the overlaying RAM disk, and so it would not start them and the = user > would notice the error. >=20 > The underlying overlayfs bug was reported in 2011 here: >=20 > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/882147 >=20 > As far as I know it has never been fixed. (The bug report was > closed in 2019 for one of the usual bogus reasons.) >=20 > The problem came because real tools (like systemd, or the tail = command) > actually started using inotify, assuming that as a well documented > kernel interface, it would actually meet its specs. And because a > completely unrelated other real tool (like the LiveCD installer) > actually started using overlayfs, assuming that as a well documented > kernel interface, it too would actually meet its specs. And then one > day somebody tried to use both those tools together and they failed. >=20 > That's why telling people "Don't use mmap() on the same file that you > use read() on" is an invalid attitude for a Real Kernel Maintainer. > Props to Larry McVoy for caring about this. Boos to the Linux > maintainers of overlayfs who didn't give a shit. >=20 > John > =09 --Apple-Mail=_165EAB8B-E222-4119-97DE-1138AFBAD628 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 When = I was working for a big chip testing company, a STDIO vs MMAP problem = came up.

The tester = was at it=E2=80=99s heart a SPARC VME system running Solaris.  The = tester read in =E2=80=98patterns=E2=80=99 
from = disk, it it literally took hours to read in all the test patterns. At = the scale of the large chip vendors,
every minute = you can=E2=80=99t test because the tester is booting, etc means dollars = are lost.

We = wrote a bunch of macros that replaced the STDIO file system I/O calls = with the equivalent 
mmap calls. . It turns = out STDIO does a lot of prefetching and has some assumptions that = you=E2=80=99re 
going to read a file linearly = from beginning to end, whereas we wanted to jump around a lot in the = pattern files.

Pattern loading went from 4 hours to 30 minutes. Our customer = was ecstatic.

Joe McGuckin
ViaNet = Communications

650-207-0372 = cell
650-213-1302 office
650-969-2124 = fax



On Feb 5, 2021, at 5:18 PM, John Gilmore <gnu@toad.com> = wrote:

On Thu, Feb 04, 2021 at 09:17:54PM -0800, Bakul Shah = wrote:
Write(2)ing to = a mapped page sounds pretty dodgy. Likely to get you
in = trouble in any case. Similarly read(2)ing.
Uh, no.  You misunderstand completely.

The purpose of the kernel is to provide a reliable interface = to system
facilities, that lets processes NOT DEPEND on = what other processes are
doing.

The decision about whether Tool X uses mmap() versus read() = to access a
file, or mmap() versus write() to change one, = is a decision that DOES
NOT DEPEND on what Tool Y is = doing.  Tools X and Y may have been written
by = different groups in different decades.  Tool X may have been = written
to use stdio, which used read().  Three years = later, stdio got rewritten
to use mmap() for speed, but = that's invisible to the author of Tool X.
And maybe an end = user in 2025 decides to use both Tool X and Tool Y on
the = same file.  So only much later will any malign interactions = between
read/write and mmap actually be noticed by end = users.  And the fix is
not to create new dependencies = between Tool X, stdio, and Tool Y.  It is
to fix the = kernel so they do not depend on each other!

Here is a real-life example from my own experience.

There is a long-standing bug in the Linux = kernel, in which the inotify()
system call simply didn't = work on nested file systems.  This caused a
long-standing bug in Ubuntu, which I reported in 2012 = here:

 https://bugs.launchpad.net/ubuntu/+source/rpcbind/+bug/977847

The symptom was that after booting from = a LiveCD image, "apt-get
install" for system services (in = my case an NFS client package) wouldn't
work.  Turned = out the system startup scripts used inotify() to notice
and = start newly installed system services.  The root cause was that
inotify failed because the root file system was an = "overlayfs" that
overlaid a RAMdisk on top of the = read-only LiveCD file system.  The
people who = implemented "overlayfs" didn't think inotify() was important,
or they thought it would be too much work to make it actually = meet its
specs, so they just made it ignore changes to the = files in the overlaid
file system.  So the startup = daemon's inotify() would never report the
creation of new = files about the new services, because those files were
in = the overlaying RAM disk, and so it would not start them and the user
would notice the error.

The = underlying overlayfs bug was reported in 2011 here:

 
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/882147=

As far as I know it has never been fixed. =  (The bug report was
closed in 2019 for one of the = usual bogus reasons.)

The problem came = because real tools (like systemd, or the tail command)
actually started using inotify, assuming that as a well = documented
kernel interface, it would actually meet its = specs.  And because a
completely unrelated other real = tool (like the LiveCD installer)
actually started using = overlayfs, assuming that as a well documented
kernel = interface, it too would actually meet its specs.  And then one
day somebody tried to use both those tools together and they = failed.

That's why telling people "Don't = use mmap() on the same file that you
use read() on" is an = invalid attitude for a Real Kernel Maintainer.
Props to = Larry McVoy for caring about this.  Boos to the Linux
maintainers of overlayfs who didn't give a shit.

John


= --Apple-Mail=_165EAB8B-E222-4119-97DE-1138AFBAD628--