From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11452
Path: news.gmane.org!.POSTED!not-for-mail
From: Benjamin Slade <slade@jnanam.net>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: ENOSYS/EOPNOTSUPP fallback?
Date: Sun, 11 Jun 2017 14:57:59 -0600
Message-ID: <8760g2utrc.fsf@jnanam.net>
References: <87d1ajdsp8.fsf@utah.edu> <b6bc4261.dNq.dMV.B.pUrCBw@mailjet.com>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: blaine.gmane.org 1497218685 16986 195.159.176.226 (11 Jun 2017 22:04:45 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Sun, 11 Jun 2017 22:04:45 +0000 (UTC)
User-Agent: mu4e 0.9.18; emacs 25.2.1
Cc: musl@lists.openwall.com
To: Joakim Sindholt <opensource@zhasha.com>
Original-X-From: musl-return-11465-gllmg-musl=m.gmane.org@lists.openwall.com Mon Jun 12 00:04:40 2017
Return-path: <musl-return-11465-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by blaine.gmane.org with smtp (Exim 4.84_2)
	(envelope-from <musl-return-11465-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1dKAyO-0004Bt-A8
	for gllmg-musl@m.gmane.org; Mon, 12 Jun 2017 00:04:40 +0200
Original-Received: (qmail 25682 invoked by uid 550); 11 Jun 2017 22:04:43 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Original-Received: (qmail 31854 invoked from network); 11 Jun 2017 20:58:14 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=sender:references:user-agent:from:to:cc:subject:in-reply-to:date
         :message-id:mime-version;
        bh=4RmHOi8fgNHffZtY8NMxsEYz6cCZlQ8pz8RMt24XOX0=;
        b=GG8oQKQgFIZd67zhX6UoHF7N9tNCT03Ixexq92f2ielFulz9LyA0Deea7BwWu7HftS
         hRnlX20+u/9wNDqyN0K+oKr1Ykfyixam46U/xVlRonZg6CW1GRU3z4sEY6OjFpA+hUyX
         J7BXki3XvwEi7vOhMHpx9hrHXIwqWmzo+B3WR0kQkhqep3ImsbPafxRLJ13YqJ1ltiTQ
         Iln9m4EQfGLby8gSuzzvl0/a2Imcj+QBIUng6vNYdWvBUpnxMrZtYpxnhQTWk3YSCo2n
         4P990bisAoJxQx8KyJ8G0iyxO7gQLjSvEEmwwRvQ273X4cDFu1mL884ZVhg879CxoNmd
         rf0w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:sender:references:user-agent:from:to:cc:subject
         :in-reply-to:date:message-id:mime-version;
        bh=4RmHOi8fgNHffZtY8NMxsEYz6cCZlQ8pz8RMt24XOX0=;
        b=nuECq+tusv7Wa6OpvXgQO7JrHuJoJPh36g/7WXbHBnbVNQINboO3JrlT5S5zZzgPxN
         D4CC/JgwYAJNwKjaX2IeMceMb4dl1CHai0NAC0cHVhqw/t20RVUG0phhSfKzbT69ghNI
         KDhfB7hFtSw95WejgxFmLp33bCU4igWimHH3CTNjRMpzA6stEhSR94E4p10zGP+P1LO/
         hO8nYEw7GNcl2ioC5XOS4z7uo6pFn3ww9wJBClJwISAUzvhGRiYbDNIVKmUfqAp8qeLF
         5Q6pWwbB2iClfOI2d9cs+eL581MB8yKLlc0aBrTUCaSvROgNC+4VcU65JpKiF1aPX+5c
         KG9A==
X-Gm-Message-State: AKS2vOy759hAUSn62Kl9f63dAMDruev9ZX82BYXnUb4t2kit/Be2OS3e
	0WPfqSgCqr7RljmtFEI=
X-Received: by 10.107.22.1 with SMTP id 1mr2550627iow.117.1497214681496;
        Sun, 11 Jun 2017 13:58:01 -0700 (PDT)
Original-Sender: Benjamin Slade <beoram@gmail.com>
In-reply-to: <b6bc4261.dNq.dMV.B.pUrCBw@mailjet.com>
Xref: news.gmane.org gmane.linux.lib.musl.general:11452
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/11452>

Thank you for the extensive reply.

Just to be clear: I'm just an end-user of flatpak, &c. As far as I can
tell, flatpak is making use of `ostree` which assumes that the libc will
take care of handling `dd` fallback (I got the impression that flatpak
isn't directly calling `fallocate` itself).

Do you think there's an obvious avenue for following up on this?
Admittedly this is an edge-case that won't necessarily affect musl users
on ext4, but it will affect musl users on zfs (and I believe
f2fs). Do you think `ostree` shouldn't rely on the libc for fallback? Or
should ZFS on Linux implement a fallback for fallocate?

--
Benjamin Slade
  `(pgp_fp: ,(21BA 2AE1 28F6 DF36 110A 0E9C A320 BBE8 2B52 EE19))
    '(sent by mu4e on Emacs running under GNU/Linux . https://gnu.org )
       '(Choose Linux, Choose Freedom . https://linux.com )


On 2017-06-05T06:46:33-0600, Joakim Sindholt <opensource@zhasha.com> wrote:

 > On Sun, Jun 04, 2017 at 09:22:27PM -0600, Benjamin Slade wrote:
 > > I ran into what is perhaps a weird edge case. I'm running a system with
 > > musl that uses a ZFS root fs. When I was trying to install some
 > > flatpaks, I got an `fallocate` failure, with no `dd` fallback. Querying
 > > the flatpak team, the fallback to `dd` seems to be something which glibc
 > > does (and so the other components assume will be taken care).
 > >
 > > Here is the exchange regarding this issue:
 > > https://github.com/flatpak/flatpak/issues/802

 > To quote the glibc source file linked in the bug:

 >   /* Minimize data transfer for network file systems, by issuing
 >      single-byte write requests spaced by the file system block size.
 >      (Most local file systems have fallocate support, so this fallback
 >      code is not used there.)  */

 >   /* NFS clients do not propagate the block size of the underlying
 >      storage and may report a much larger value which would still
 >      leave holes after the loop below, so we cap the increment at
 >      4096.  */

 >   /* Write a null byte to every block.  This is racy; we currently
 >      lack a better option.  Compare-and-swap against a file mapping
 >      might address local races, but requires interposition of a signal
 >      handler to catch SIGBUS.  */

 > Which leaves 2 massive bugs:
 > 1) the leaving of unallocated gaps both because of the NFS thing but
 > also because other file systems may work on entirely different
 > principles that are not accounted for here and
 > 2) overwriting data currently being written to the file as it's being
 > forcibly allocated (which might be doing nothing, think deduplication).

 > This is not a viable general solution and furthermore fallocate is
 > mostly just an optimization hint. If it's a hard requirement of your
 > software I would suggest implementing it in your file system. These
 > operations can only be safely implemented in the kernel.

 > An example:

 > MyFS uses write time deduplication on unused blocks (and blocks with all
 > zeroes fall under the umbrella of unused). Glibc starts its dance where
 > it writes a zero byte to the beginning of each block it perceives and
 > for now let's just say it has the right block size. MyFS just trashes
 > these writes immediately without touching the disk and updates the size
 > metadata which gets lazily written at some point. There's only 400k left
 > on the disk and your fallocate of 16G will succeed and run exceptionally
 > fast to boot, but it will have allocated nothing and your next write
 > fails with ENOSPC.

 > Another example:

 > myutil has 2 threads running. One thread is constantly writing things to
 > a file. The other thread sometimes writes large chunks of data to the
 > file and so it hints the kernel to allocate these large chunks by
 > calling fallocate, and only then taking the lock(s) held internally to
 > synchronize the threads. The first thread finds it needs to update
 > something in the section currently being fallocated by glibc's
 > algorithm. Suddenly zero bytes appear at 4k intervals for no discernible
 > reason, overwriting the data.


 > Personally I would look into seeing to it that flatpak only uses
 > fallocate as an optimization. The most reliable thing I can think of
 > otherwise would be to do the locking necessary (if any) in the program
 > and filling the entire target section of the file with data from
 > /dev/urandom, but even that may fail spectacularly with transparent
 > compression (albeit unlikely).

 > Hope this was at least somewhat helpful.