From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HTML_MESSAGE,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 14355 invoked from network); 6 Feb 2021 02:23:19 -0000 Received: from minnie.tuhs.org (45.79.103.53) by inbox.vuxu.org with ESMTPUTF8; 6 Feb 2021 02:23:19 -0000 Received: by minnie.tuhs.org (Postfix, from userid 112) id EFD069BA63; Sat, 6 Feb 2021 12:23:17 +1000 (AEST) Received: from minnie.tuhs.org (localhost [127.0.0.1]) by minnie.tuhs.org (Postfix) with ESMTP id 905689BA45; Sat, 6 Feb 2021 12:22:49 +1000 (AEST) Authentication-Results: minnie.tuhs.org; dkim=pass (1024-bit key; unprotected) header.d=servium.ch header.i=@servium.ch header.b="C7ORBEcF"; dkim-atps=neutral Received: by minnie.tuhs.org (Postfix, from userid 112) id B51779BA45; Sat, 6 Feb 2021 12:22:47 +1000 (AEST) Received: from mail-ej1-f53.google.com (mail-ej1-f53.google.com [209.85.218.53]) by minnie.tuhs.org (Postfix) with ESMTPS id C70409BA3F for ; Sat, 6 Feb 2021 12:22:45 +1000 (AEST) Received: by mail-ej1-f53.google.com with SMTP id i8so15360005ejc.7 for ; Fri, 05 Feb 2021 18:22:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=servium.ch; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=KkZz05tOCcI5X7GPdK04zPR3D9Z4+xPhIOeVTcvogMk=; b=C7ORBEcFQI3PSEoCGxs9+X3uZXVNlwt6Ipr3k+dFV4r6EvJJBB/QXFK8Xp4xVwSQTg jocPKvg+xl674BfPdg75j1wNF8wkMj/obNtRyUjVfZG/NZGSXSTPfrLb+6z6cuI/Im+O +zoKKohQFDTC549Vkpy+Svs9pKu9U+RSgp4Rw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=KkZz05tOCcI5X7GPdK04zPR3D9Z4+xPhIOeVTcvogMk=; b=O128RVF8jeJhynoO+EDSDYYuVZzbgTmksJS2PxE9Y83pz+wvMoUNqueuORSf28jXN1 +82j+pe6GkRzGnK9TbtBwObyNhIMQ/8k3/wmUUuIDyoto3UzHsObyZ6r92tZryhpSBuv 5h+Dg8V0cQuHWrYbOkEtz8y02pOJkI2NaT9fBZWjumXpGeMPEDc6BNm0NdqXSwRzXI+n Wccb3qTDQHY5k1yPZ/spa2lCvt1PpCu3r/FQHDw8vBCC/KeQo3s2fnZK7r3HCIqOyTsF IgLVJ7yHt0du8cAZID92IxGJ3T2xoy0x6GATgIeYxTymaPi5oElHLjxXoMqBDPXCPiEw 8YmQ== X-Gm-Message-State: AOAM533akw2O46ucp7GUIv7AG3j7q5cUJ9HKZwhZ4AKvzUHNQyhHr8L9 Eekur/hNMAtcQOFKoFswpmUAz6E7PQID3ufNApC6Nw== X-Google-Smtp-Source: ABdhPJxqhUZc3SyZjQXzZa3diiosz4W0BlZlz0nKqTnxOyWTNLrPbN/aQUlvczqHT6jXJa3kavsvkPswoFQcBO6PPaw= X-Received: by 2002:a17:906:2755:: with SMTP id a21mr6900288ejd.374.1612578164139; Fri, 05 Feb 2021 18:22:44 -0800 (PST) MIME-Version: 1.0 References: <202101301950.10UJoWeA456408@darkstar.fourwinds.com> <20210130222854.GN4227@mcvoy.com> <20210130231119.GA33905@eureka.lemis.com> <20210131022500.GU4227@mcvoy.com> <9504e27d-d976-9681-6b97-aa87d124fc43@gmail.com> In-Reply-To: From: Rico Pajarola Date: Fri, 5 Feb 2021 18:22:32 -0800 Message-ID: To: Dave Horsfall Content-Type: multipart/alternative; boundary="000000000000a57f2105baa19c99" Subject: Re: [TUHS] FreeBSD behind the times? (was: Favorite unix design principles?) X-BeenThere: tuhs@minnie.tuhs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: The Unix Heritage Society mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: The Eunuchs Hysterical Society Errors-To: tuhs-bounces@minnie.tuhs.org Sender: "TUHS" --000000000000a57f2105baa19c99 Content-Type: text/plain; charset="UTF-8" On Fri, Feb 5, 2021 at 12:51 PM Dave Horsfall wrote: > > [...] > > Thanks; I'd heard that ZFS was a compressed file system, so I stopped > right there (I had lots of experience in recovering from corrupted RK05s, > and didn't need any more trouble). > That's funny, for me this is the main reason to use ZFS... What really sets ZFS apart from everything else is the lack of trouble and its resilience to failures. We used to have lots and lots of ZFS filesystems at work, and I've been using ZFS exclusively at home ever since. I have run into a non-importable ZFS file system (all drives are there, but it's corrupt and won't import) only once, and was able to fix it with zdb. ZFS compression is completely optional, and not even on by default. I've only tried it once and found it cost too much performance on something that's not very fast to begin with, but I don't think it affects data recovery much (the way ZFS stripes data makes traditional data recovery tools pretty useless anyway). I personally don't care about purity of implementation, because everything is a trade-off. The argument really reminds me of Tanenbaums criticism of the Linux Monokernel (was Tanenbaum right? Maybe, but who cares, because Linux took over the world, and Minix didn't, so from a practical point of view, Linus was right). The other one it reminds me of is the criticism of TCPs "blatant layering violation" (vs OSI). But IMHO the critics were just jealous of the cool things they couldn't do because they needed to respect the division of labor along those pesky layers. I remember reading on one of the Sun engineers blogs (remember when Sun allowed their engineers to keep blogs about Solaris development? Good times!) about the heated discussions they had over the ARC and bypassing the page cache. I don't remember the actual arguments for it, but it was certainly not a decision that was made out of laziness. Performance wise, ZFS is not the best, and if that's all you care about, there are better options. It needs a lot of tuning to just reach "acceptable" and it definitely does not play well with doing other stuff on the same machine (it pretty much assumes that your storage appliance is dedicated). It has particularly abysmal performance when you do lots of small random writes and then try to read that back in order, but if you care about not losing your data, it's 2nd to none. In $JOB-1 (almost 15 years ago), we spent a few weeks stress testing ZFS. The setup was 24x4TB SATA drives, divided into 2 12 drive raidz2 vdevs or something like that. All tests were done while it was busy reading/writing checksummed test files at full speed, 1GB/s or so (see? Performance was not impressive. We definitely got a lot more out of that with UFS). What was absolutely stunning was the fact that in all our tests it never served one bit of corrupted data. It either had it, or it returned an error. We tortured the storage in any way we could imagine. Wiggled the cables, yanked out drives, used dd to overwrite random parts or entire drives, smashed a drive with a hammer and put it back in, put in drives of the wrong size, put in known bad drives, yanked out drives while it was resilvering, put drives back into a different slot, overwrote stuff while it was resilvering. Unplugged the entire storage, plugged the storage into another machine and imported it, plugged the drives back into the first machine in a different order. We even did things like "copy a drive onto a spare with dd, remove 3 drives, and then substitute the spare drive for the removed one" (this led to some data loss because making the copy was not atomic, but most of the data was recoverable). And no matter what we did, it just kept going unless the data was simply not there, and even then, it kept serving the files (or parts of files) that were available, and indicated exactly which files were affected by data loss. And when you put the drives back (or restored the overwritten parts), it would continue as if nothing had ever happened. If you've ever wrestled a hardware RAID controller, or VxFS/JFS/HPFS, or mdadm, you know that none of that can be taken for granted, and that doing any of the stupid things mentioned above would most likely lead to complete data loss and/or serving lots of random corrupted data and no way to tell what had been corrupted. I remember some performance issues with mmap, but I don't remember how we fixed it. Probably just sucked it up. Using ZFS was not for maximum performance. --000000000000a57f2105baa19c99 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Fri, Feb 5, 2021 at 12:51 PM Dave Hors= fall <dave@horsfall.org> wro= te:

[...]

Thanks; I'd heard that ZFS was a compressed file system, so I stopped <= br> right there (I had lots of experience in recovering from corrupted RK05s, <= br> and didn't need any more trouble).
That's= funny, for me this is the main reason to use ZFS... What really sets ZFS a= part from everything else is the lack of trouble and its resilience to fail= ures.=C2=A0 We used to have lots and lots of ZFS filesystems at work, and I= 've been using ZFS exclusively at home ever since. I have run into a no= n-importable ZFS file system (all drives are there, but it's corrupt an= d won't import) only once, and was able to fix it with zdb.
<= /div>

ZFS compression is completely optional,= and not even on by default. I've only tried it once and found it cost = too much performance on something that's not very fast to begin with, b= ut I don't think it affects data recovery much (the way ZFS=C2=A0stripe= s data makes traditional data recovery tools pretty useless anyway).

I personally don= 9;t care about purity of implementation, because everything is a trade-off.= The argument really reminds me of Tanenbaums=C2=A0criticism of the Linux M= onokernel (was Tanenbaum right? Maybe, but who cares, because Linux took ov= er the world, and Minix didn't, so from a practical point of view, Linu= s was right). The other one it reminds me of is the criticism of TCPs "= ;blatant layering violation" (vs OSI). But IMHO the critics were just = jealous of the cool things they couldn't do because they needed to resp= ect the division of labor=C2=A0along those pesky layers.

I remember reading on one of the=C2= =A0Sun engineers blogs (remember when Sun allowed their engineers to keep b= logs about Solaris development? Good times!) about the heated discussions t= hey had over the ARC and bypassing the page cache. I don't remember the= actual arguments for it, but it was certainly not a decision that was made= out of laziness.

Performance wise, ZFS is not= the best, and if that's all you care about, there are better options. = It needs a lot of tuning to just reach "acceptable" and it defini= tely does not play well with doing other stuff on the same machine (it pret= ty much assumes that your storage appliance is dedicated). It has particula= rly abysmal performance when you do lots of small random writes and then tr= y to read that back in order, but if you care about not losing=C2=A0your da= ta, it's 2nd to none.

I= n $JOB-1 (almost 15 years ago), we spent a few weeks stress=C2=A0testing ZF= S. The setup was 24x4TB SATA drives, divided into 2 12 drive raidz2 vdevs o= r something like that. All tests were done while it was busy reading/writin= g checksummed=C2=A0test files at full speed, 1GB/s or so (see? Performance = was not impressive. We definitely got a lot more out of that with UFS). Wha= t was absolutely stunning was the fact that in all our tests it never serve= d one bit of corrupted data. It either had it, or it returned an error.

We tortured the storage in any way we could imagine. = Wiggled the cables, yanked out drives, used dd to overwrite random parts or= entire drives, smashed a drive with a hammer and put it back in, put in dr= ives of the wrong size, put in known bad drives, yanked out drives while it= was resilvering, put drives back into a different slot, overwrote stuff wh= ile it was resilvering. Unplugged the entire storage, plugged the storage i= nto another machine and imported it, plugged the drives back into the first= machine in a different order. We even did things like "copy a drive o= nto a spare with dd, remove 3 drives, and then substitute the=C2=A0spare dr= ive for the removed one" (this led to some data loss because making th= e copy was not atomic, but most of the data was recoverable). And no matter= what we did, it just kept going unless the data was simply not there, and = even then, it kept serving the files (or parts of files) that were availabl= e, and indicated exactly which files were affected by data loss. And when y= ou put the drives back (or restored the overwritten parts), it would contin= ue as if nothing had ever happened.

If you've = ever wrestled a hardware RAID controller, or VxFS/JFS/HPFS, or mdadm, you k= now that none of that can be taken for granted, and that doing any of the s= tupid things mentioned above would most likely lead to complete data loss a= nd/or serving lots of random corrupted data and no way to tell what had bee= n corrupted.

I remember some performance issues wi= th mmap, but I don't remember how we fixed it. Probably just sucked it = up. Using ZFS was not for maximum performance.

--000000000000a57f2105baa19c99--