From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=MAILING_LIST_MULTI, NICE_REPLY_A autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 5882 invoked from network); 30 Aug 2021 16:54:45 -0000 Received: from minnie.tuhs.org (45.79.103.53) by inbox.vuxu.org with ESMTPUTF8; 30 Aug 2021 16:54:45 -0000 Received: by minnie.tuhs.org (Postfix, from userid 112) id 753A99D531; Tue, 31 Aug 2021 02:54:42 +1000 (AEST) Received: from minnie.tuhs.org (localhost [127.0.0.1]) by minnie.tuhs.org (Postfix) with ESMTP id 27D1A9D52D; Tue, 31 Aug 2021 02:54:18 +1000 (AEST) Received: by minnie.tuhs.org (Postfix, from userid 112) id 9F3709D52D; Tue, 31 Aug 2021 02:54:14 +1000 (AEST) X-Greylist: delayed 438 seconds by postgrey-1.36 at minnie.tuhs.org; Tue, 31 Aug 2021 02:54:14 AEST Received: from p3plsmtpa12-07.prod.phx3.secureserver.net (p3plsmtpa12-07.prod.phx3.secureserver.net [68.178.252.236]) by minnie.tuhs.org (Postfix) with ESMTPS id 38AF49D52B for ; Tue, 31 Aug 2021 02:54:14 +1000 (AEST) Received: from kilonet.net ([108.30.24.253]) by :SMTPAUTH: with ESMTPA id KkQxmt2OACaa2KkQxmrJ08; Mon, 30 Aug 2021 09:46:55 -0700 X-CMAE-Analysis: v=2.4 cv=B9Z8bMhM c=1 sm=1 tr=0 ts=612d0b7f a=DPaANz7MSPdDiKJlpBZCUg==:117 a=DPaANz7MSPdDiKJlpBZCUg==:17 a=IkcTkHD0fZMA:10 a=MhDmnRu9jo8A:10 a=55CMXMhzPCAvSrLuNd0A:9 a=QEXdDO2ut3YA:10 X-SECURESERVER-ACCT: krewat@kilonet.net Received: from [199.89.231.103] (flatcat.kilonet.net [199.89.231.103]) by kilonet.net (Postfix) with ESMTP id 7242D2165647C for ; Mon, 30 Aug 2021 12:46:55 -0400 (EDT) To: tuhs@minnie.tuhs.org References: <20210830130603.A7D4C640CC6@lignose.oclsc.org> From: Arthur Krewat Message-ID: <434d8b8d-ed64-cc73-9947-3d415a90bb08@kilonet.net> Date: Mon, 30 Aug 2021 12:46:59 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: <20210830130603.A7D4C640CC6@lignose.oclsc.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-CMAE-Envelope: MS4xfPHZWY9EiRNugDNS6C6YEMf9e2vJTHhCwy+pZIiUFoG889RUvFw7ZWaLYRybiTa7k2kebQdV+HutCqkzN+Ip+ZmS1RQfw8UYv+1N58EWUN0ljobjXH9E k/nK3Hx8FX8SciIUh6gxOshjyvjAE57UefFU3VxtCXcWqIh3QYM6pLotsAPIdbiIM0cjQlzDxQmomg== Subject: Re: [TUHS] Is it time to resurrect the original dsw (delete with switches)? X-BeenThere: tuhs@minnie.tuhs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: The Unix Heritage Society mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: tuhs-bounces@minnie.tuhs.org Sender: "TUHS" On 8/30/2021 9:06 AM, Norman Wilson wrote: > A key point is that the character of the errors they > found suggests it's not just the disks one ought to worry > about, but all the hardware and software (much of the latter > inside disks and storage controllers and the like) in the > storage stack. I had a pair of Dell MD1000's, full of SATA drives (28 total), with the SATA/SAS interposers on the back of the drive. Was getting checksum errors in ZFS on a handful of the drives. Took the time to build a new array, on a Supermicro backplane, and no more errors with the exact same drives. I'm theorizing it was either the interposers, or the SAS backplane/controllers in the MD1000. Without ZFS, who knows who swiss-cheesy my data would be. Not to mention the time I setup a Solaris x86 cluster zoned to a Compellent and periodically would get one or two checksum errors in ZFS. This was the only cluster out of a handful that had issues, and only on that one filesystem. Of course, it was a production PeopleSoft Oracle database. I guess moving to a VMware Linux guest and XFS just swept the problem under the rug, but the hardware is not being reused so there's that. > I had heard anecdotes long before (e.g. from Andrew Hume) > suggesting silent data corruption had become prominent > enough to matter, but this paper was the first real study > I came across. > > I have used ZFS for my home file server for more than a > decade; presently on an antique version of Solaris, but > I hope to migrate to OpenZFS on a newer OS and hardware. > So far as I can tell ZFS in old Solaris is quite stable > and reliable. As Ted has said, there are philosophical > reasons why some prefer to avoid it, but if you don't > subscribe to those it's a fine answer. > Been running Solaris 11.3 and ZFS for quite a few years now, at home. Before that, Solaris 10. I recently setup a home Redhat 8 server, w/ZoL (.8), earlier this year - so far, no issues, with 40+TB online. I have various test servers with ZoL 2.0 on them, too. I have so much online data that I use as the "live copy" - going back to the early 80's copies of my TOPS-10 stuff. Even though I have copious amounts of LTO tape copies of this data, I won't go back to the "out of sight out of mind" mentality. Trying to get customers to buy into that idea is another story. art k. PS: I refuse to use a workstation that doesn't use ECC RAM, either. I like swiss-cheese on a sandwich. I don't like my (or my customers') data emulating it.