The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: krewat@kilonet.net (Arthur Krewat)
Subject: [TUHS] UNIX of choice these days?
Date: Wed, 20 Sep 2017 13:31:19 -0400	[thread overview]
Message-ID: <568e3829-07d9-22c2-b3bd-b2a6a244bcc9@kilonet.net> (raw)
In-Reply-To: <20170920165830.e6lp4acvfpq63n25@matica.foolinux.mooo.com>

Oh, and one more thing that may produce an offshoot, or at least more 
discussion about what's the "right thing to do".

I'm coming off as a Solaris snob, I'm sure, but that's ok ;)

Using Solaris and ZFS, it automatically checksums everything, and can 
correct it on-the-fly. Add to that raidz2, and most data corruption can 
be dealt with.

Which brings up my experience with bit-rot.

Two stories:

1) Home server, using SAS to some Dell MD1000's with SATA drives in them 
(through SAS->SATA interposers), and find that one of controllers in one 
of the MD1000s was corrupting data. On average at it's height, I was 
getting one or two checksum errors in ZFS a day. I didn't notice it 
right away until ZFS actually errored out a disk because of it, and the 
raidz2 zpool went DEGRADED. By the time I dealt with it, I had a few 
hundred errors showing in the zpool status.

It was pretty obvious which MD1000 controller was causing the issue 
because almost every drive on that particular controller was reporting 
errors all at the same time. But it was at a level that the data on the 
disk was actually being corrupted "in flight" in such a way that the SAS 
controller in the server didn't see any protocol errors, it was really 
data corruption at the sector level.

2) Work server, M1000e chassis with an Oracle Solaris cluster on a pair 
of M610 blades, Emulex fiber controllers, Brocade 5100 switchs, and a 
Dell Compellent. Twice in two years, ZFS noticed a checksum error in a 
record of a file. One was a redo log that had already been read before 
it errored, and the other was a flashback log that wasn't necessary for 
continued operation of the database.

This one, I'm not so sure isn't a bug in firmware (or even Solaris) 
somewhere along the path. One error happened on one node, the other 
error happened on the other node. Two different types of databases - one 
Student Information System, the other online learning. QA cluster never 
see any issues.

Problem with this is, I'm using ZFS on top of a SAN - so there's no 
mirroring or raidz# going on, it's all on the SAN to deal with errors. 
Once ZFS sees corruption, the file goes into "I/O error".

--

Both these stories point out that bit-rot is really a thing. I refuse to 
store any of my own personal/work/whatever data on a machine that 
doesn't do ECC for RAM, or filesystems that do not checksum. I have a 
lot of old data and source code stored on my array. I would hate to open 
an old source file and see a corrupted sector right in the middle of it. 
I've seen it happen to other people. I've seen it happen to me 20 years 
ago. Never again.

I back everything up to an LTO4 library, and regular take 
infinite-retention backups and store them off-site, and recently started 
up an Amazon EC2 instance in Ireland and rsync stuff to that using 
"magnetic" storage (spinning disk) - which is relatively cheap.

Anyone know of a reliable filesystem that checksums everything? Oh wait, 
ZFS is available for Linux - wonder if I can install it on an Amazon 
micro t2 instance? I'll have to check.


  parent reply	other threads:[~2017-09-20 17:31 UTC|newest]

Thread overview: 154+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-20  0:12 Arthur Krewat
2017-09-20  0:26 ` Larry McVoy
2017-09-20  0:39 ` Dave Horsfall
2017-09-20  1:03   ` Lyndon Nerenberg
2017-09-20 20:56     ` jason-tuhs
2017-09-23  9:17   ` Dario Niedermann
2017-09-23  9:36     ` Steve Mynott
2017-09-23 10:03       ` Dario Niedermann
2017-09-23 23:04         ` Dave Horsfall
2017-09-24  0:11           ` Random832
2017-09-24  1:19             ` Dave Horsfall
2017-09-24 13:46       ` Andy Kosela
2017-09-24 14:02         ` ron minnich
2017-09-24 14:06           ` Larry McVoy
2017-09-24 20:36             ` Kurt H Maier
2017-09-24 21:38               ` Bakul Shah
2017-09-24 23:36                 ` Dave Horsfall
2017-09-24 23:50                   ` Steve Nickolas
2017-09-25  0:03                     ` Wesley Parish
2017-09-25 15:36                       ` Tony Finch
2017-09-26  0:42                         ` Wesley Parish
2017-09-26  9:54                           ` Tony Finch
2017-09-26 14:41                           ` Larry McVoy
2017-09-26 17:34                             ` Bakul Shah
2017-09-26 17:39                               ` Warner Losh
2017-09-26 18:26                                 ` Bakul Shah
2017-09-26 17:43                               ` Larry McVoy
2017-09-26 19:44                                 ` Grant Taylor
2017-09-26 23:22                             ` Wesley Parish
2017-09-25  0:51                     ` Charles Anthony
2017-09-25  0:36                   ` Dan Cross
2017-09-25  0:44                     ` Grant Taylor
2017-09-25  0:56                   ` Bakul Shah
2017-09-25 15:45                     ` Tony Finch
2017-09-25 16:14                       ` Bakul Shah
2017-09-25  7:41                   ` Andy Kosela
2017-09-25  7:43                     ` Cory Smelosky
2017-09-25 10:14                       ` Andy Kosela
2017-09-25  9:58                     ` Steve Nickolas
2017-09-25 11:14                       ` Derek Fawcus
2017-09-25 11:48                       ` Andrew Warkentin
2017-09-24 15:26           ` Christian Barthel
2017-09-24 17:33             ` Clem Cole
2017-09-24 17:33           ` Clem Cole
2017-09-24 17:51             ` [TUHS] RFS was: " Arthur Krewat
2017-09-24 19:54               ` Clem Cole
2017-09-24 21:59                 ` Arthur Krewat
2017-09-24 22:08                 ` Arthur Krewat
2017-09-24 23:52                   ` Clem Cole
2017-09-27  8:44                 ` arnold
2017-09-27 15:25                   ` Arthur Krewat
2017-09-27 15:49                     ` arnold
2017-09-27 17:38                   ` Mantas Mikulėnas
2017-09-27 23:01                   ` Kevin Bowling
2017-09-27 23:11                     ` Clem Cole
2017-09-27 23:13                       ` Kevin Bowling
2017-09-28  0:39                         ` Larry McVoy
2017-09-28  3:19                           ` Theodore Ts'o
2017-09-28 13:45                             ` Larry McVoy
2017-09-28 17:12                               ` Steve Johnson
2017-09-28 17:58                                 ` [TUHS] Bill Joy was: Re: RFS Forrest, Jon
2017-09-28  0:54                         ` [TUHS] RFS was: Re: UNIX of choice these days? Dave Horsfall
2017-09-28  0:59                           ` William Pechter
2017-09-28 13:49                         ` arnold
2017-09-28 14:07                           ` Larry McVoy
2017-09-28 14:28                             ` arnold
2017-09-28 19:49                               ` Larry McVoy
2017-09-28 20:00                             ` Bakul Shah
2017-09-28 14:27                           ` Clem Cole
2017-09-28 22:08                             ` Dave Horsfall
2017-09-28 22:20                               ` Larry McVoy
2017-09-29  2:23                                 ` Kevin Bowling
2017-09-29  8:59                                 ` Andreas Kusalananda Kähäri
2017-09-29 14:20                                   ` Clem Cole
2017-09-29 16:46                                   ` Grant Taylor
2017-09-29 17:02                                     ` Kurt H Maier
2017-09-29 17:27                                       ` Pete Wright
2017-09-29 18:11                                       ` Grant Taylor
2017-09-29 18:47                                     ` Andreas Kusalananda Kähäri
2017-09-29 15:22                                 ` George Ross
2017-09-29 18:40                                   ` Don Hopkins
2017-09-29 19:03                                     ` Larry McVoy
2017-09-29 21:24                                     ` Arthur Krewat
2017-09-29 22:11                                       ` Don Hopkins
2017-09-29 22:21                                         ` Don Hopkins
2017-09-29 19:19                                 ` Dan Cross
2017-09-29 19:22                                   ` Larry McVoy
2017-09-29 20:52                                   ` Jon Forrest
2017-09-23 23:00     ` [TUHS] " Dave Horsfall
2017-09-26 22:00     ` Christian Groessler
2017-09-20  4:42 ` Grant Taylor
2017-09-20  8:31   ` Mutiny 
2017-09-20  9:15 ` Steve Nickolas
2017-09-20 16:58   ` Arthur Krewat
2017-09-20 17:05     ` Steve Nickolas
2017-09-20 17:53     ` Henry Bent
2017-09-20 18:12       ` Arthur Krewat
2017-09-20 18:33         ` Brad Spencer
2017-09-20 19:20           ` Henry Bent
2017-09-20 19:37           ` Arthur Krewat
2017-09-20 19:58             ` Jacob Ritorto
2017-09-20 22:29               ` Ian Zimmerman
2017-09-20 22:31                 ` Warner Losh
2017-09-20 12:52 ` Chet Ramey
2017-09-20 13:33 ` Nemo
2017-09-20 15:39 ` Clem Cole
2017-09-20 15:42 ` Jon Steinhart
2017-09-20 16:58   ` Ian Zimmerman
2017-09-20 17:09     ` Jon Steinhart
2017-09-20 17:31     ` Arthur Krewat [this message]
2017-09-20 22:40 ` Steve Simon
2017-09-20 22:51   ` Erik Berls
2017-09-20 23:37 ` Robert Brockway
2017-09-21  1:47 ` Derrik Walker v2.0
2017-09-21  3:54 ` Gregg Levine
2017-09-21 14:33 ` Nicholas Chappell
2017-09-21 16:38   ` Mutiny 
2017-09-21 16:42     ` gilbertmm
2017-09-21 18:30     ` Grant Taylor
2017-09-21 23:34     ` Dave Horsfall
2017-09-25 10:36 ` Thomas Kellar
2017-09-21  2:28 Rudi Blom
2017-09-23 23:39 Nelson H. F. Beebe
2017-09-25 12:07 Norman Wilson
2017-09-25 14:16 ` Clem Cole
2017-09-25 15:13   ` Warner Losh
2017-09-25 16:51     ` Warner Losh
2017-09-26  0:56       ` ron minnich
2017-09-25 15:18   ` Larry McVoy
2017-09-25 15:30     ` Warner Losh
2017-09-25 23:49     ` Dave Horsfall
2017-09-26  2:06       ` Chet Ramey
2017-09-26 14:53         ` Larry McVoy
2017-09-26 15:17           ` Chet Ramey
2017-09-26 21:23           ` Dave Horsfall
2017-09-26 21:43             ` Arthur Krewat
2017-09-26 21:45             ` Grant Taylor
2017-09-27  0:58               ` Dave Horsfall
2017-09-27  1:37                 ` Chet Ramey
2017-09-27  2:02                   ` Larry McVoy
2017-09-27 13:50                     ` Chet Ramey
2017-09-27 14:17                       ` Larry McVoy
2017-09-28  8:10                         ` Derek Fawcus
2017-09-28 12:34                           ` Chet Ramey
     [not found]                             ` <20170928174420.GA41732@accordion.employees.org>
2017-09-28 17:57                               ` Derek Fawcus
2017-09-28 18:04                                 ` Chet Ramey
2017-09-27  3:42                   ` Dave Horsfall
2017-09-27 14:35                     ` Chet Ramey
2017-09-25 12:46 [TUHS] Unix " Doug McIlroy
2017-09-25 13:57 ` Clem Cole
2017-09-30 15:17 [TUHS] UNIX " Norman Wilson
2017-09-30 20:29 ` Kevin Bowling
2017-09-30 21:56   ` Bakul Shah
2017-09-30 22:37     ` Kevin Bowling

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=568e3829-07d9-22c2-b3bd-b2a6a244bcc9@kilonet.net \
    --to=krewat@kilonet.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).