From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <3B953ECD.473D3718@princeton.edu>
From: Martin Harriss <martin@Princeton.EDU>
MIME-Version: 1.0
To: 9fans@cse.psu.edu
Subject: Re: [9fans] (no subject)
References: <20010904130843.027A419A38@mail.cse.psu.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Date: Tue,  4 Sep 2001 16:51:25 -0400
Topicbox-Message-UUID: e9a72ffe-eac9-11e9-9e20-41e7f4b1d025

"Fco.J.Ballesteros" wrote:
>
> That's cool. BTW, I'm just wondering...

Indeed it is.

> Would a system crash while doing a (mirrowed) dump
> corrupt both disks at the same time?

 From reading Geoff's code it looks as if a crash during a block write
could leave the secondaries in a good state, but the master would be
incomplete.

> While I was trying to use a couple of ide disks to
> survive disk crashes, I thought it would be better
> not to use a real mirror because a crash at a bad
> moment could perhaps leave the cached worm unusable.
> (I had a crash while doing a dump on a cached worm and
>  the cached worm became unusable).

In the case of the worm file system, what happens if a sequence of
writes is interrupted? is the file system guaranteed to be consistent
after each write, or can an interrupted sequence destroy the file
system?  The experience above would suggest the latter.

> Am I missing something? A recover procedure I don't
> know of? Or perhaps some code in the mirror device tries
> to deal with that?

Don't see any code in there to do that.  I've also been playing with
disk morroring, and I added a 'resilver' command to copy a 'good' disk
to a 'bad' disk.  The hard part is knowing which is the good disk and
which is the bad.  If one of them goes bad while the system is running
it's easy - when you get the error you just mark that disk as bad.  But
it's impossible to know in cases when the power goes off just where you
were in the sequence of writes.  For something like a fake worm, where
speed is not a big issue, it may be worth while recording disk status in
some dedicated place on the disk for each block write.  One logical
volume manager that I am familiar with just "guesses" which side of the
mirror is good and lets fsck clean up the mess, but this is probably
impractical for a "worm" file system.

> Another question I have is can you rebuild your mirror
> device just by raw copying of one disk into another?

It looks that way from the code.  In fact, I've pretty much decided that
I'm going to build a file server with a cast off 300-or-so megabyte disk
(in addition to the real file server disks) that would contain the
bootstrap for the file server, and an 'emergency' stand-alone cpu server
that could be used to repair/copy/etc the actual file server disks.  I
can also see writing some tools to fix corrupted fake worms.

Martin

>   ------------------------------------------------------------------------
>
> Subject: [9fans] (no subject)
> Date: Tue, 4 Sep 2001 05:08:33 -0700
> From: geoff@collyer.net
> Reply-To: 9fans@cse.psu.edu
> To: 9fans@collyer.net
>
> I've fixed some bugs in the IDE file server (some latent, some new in
> the IDE code) and added a mirroring device.  I've tested it, it works
> and later today it will be my main file server.  The mirroring device
> is really very little code; the file server's elegant design is
> largely responsible for this.  Doing a dump of 457121 4K blocks from a
> cache device on h0 to a fake worm also on h0, mirrored on h1.0.0
> (a.k.a. h2) took 73 minutes, so I got 25,648,871 bytes per minute
> throughput.  I verified that the copy on h1.0.0 was correct.
>
> Here's my configuration:
>
>         config h0
>         service    fs
>         [ uninteresting ip configuration omitted ]
>         filsys main cp(h0)0.25f{p(h0)25.75p(h1.0.0)25.75}
>         filsys dump o
>         filsys other p(h1.0.0)0.25
>         ream other
>         ream main
>         end
>
> {} is the mirror device, analogous to () and [].  The first device
> inside {} is the master, any others are mirrors.  The code can be
> found at www.collyer.net/~geoff/9/.  I'll add some commentary on the
> changes later today.  They fall into several categories:
>
> - fixes to latent bugs.
> - addition of some missing switch cases for Devfworm, Devnone and Devide.
>   the file server could really use a device switch (rather than a lot of
>   switch statements scattered throughout the code).
>   in particular, device configuration strings are now printed better.
> - additional paranoia in the IDE code; specifying a non-existent drive
>   no longer causes a kernel page fault.
> - converted nemo's style back to the original style, and some tidying up.
> - probably vestigial paranoia traceable to hunting down the fpinit bug.
> - local configuration (e.g., timezone).  you'll want to crank MAXMEG up.
> - addition of the mirror device.