From mboxrd@z Thu Jan  1 00:00:00 1970
Date: Wed,  7 Jan 2009 16:36:52 -0800
From: "Roman V. Shaposhnik" <rvs@sun.com>
In-reply-to: <adbe04b607132eb0669582417a7ec312@quanstro.net>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Message-id: <1231375012.5141.205.camel@goose.sun.com>
MIME-version: 1.0
Content-type: text/plain
Content-transfer-encoding: 7BIT
References: <adbe04b607132eb0669582417a7ec312@quanstro.net>
Subject: Re: [9fans] Changelogs & Patches?
Topicbox-Message-UUID: 7bce2626-ead4-11e9-9d60-3106f5b1d025

On Tue, 2009-01-06 at 18:44 -0500, erik quanstrom wrote:
> >> a big difference between the decisions is in data integrety.
> >> it's much easier to break a fs that rewrites than it is a
> >> worm-based fs.
> >
> > True. But there's a grey area here: an FS that *never* rewrites
> > live blocks, but can reclaim dead ones. That's essentially
> > what ZFS does.
>
> unfortunately, i would think that can result in data loss since
> i can can no longer take a set of copies of the fs {fs_0, ... fs_n}
> and create a new copy with all the data possibly recovered
> by picking a set "good" blocks from the fs_i, since i can make
> a block dead by removing the file using it and i can make it
> live again by writing a new file.
>
> perhaps i've misinterpreted what you are saying?

Lets see. May be its my misinterpretation of what venti does. But so
far I understand that it boils down to: I give venti a block of any
length, it gives me a score back. Now internally, venti might decide
to split that huge block into a series of smaller ones and store it
as a tree. But still all I get back is a single score. I don't care
whether that score really describes my raw data block, or a block full
of scores that actually describe raw data. All I care is that when
I give venti that score back -- it'll reconstruct the data. I also
have a guarantee that the data will never ever be deleted.

Now, because of that guarantee (blocks are never deleted) and since
all blocks bigger than 56k get split venti has a nice property of
reusing blocks from existing trees. This happens as a by-product
of the design: I ask venti to store a block and if that same block
was already there -- there will be an extra arrow pointing at it.
All in all -- very compact way of representing a forest of trees.
Each tree corresponds to a VtEntry data structure and blocks full
of VtEntry structures are called VtEntryDir's. Finally a root
VtEntryDir is pointed at by VtRoot structure.

Contrast this with ZFS, where blocks are *not* addressed via scores,
but rather with a vdev:offset pairs called DVAs. This, of course,
means that there's no block coalescing going on. You ask ZFS to store
a block it gives you a DVA back. You ask it to store the same block
again, you get a different DVA (well, actually it gives you a block
pointer which is DVA augmented by extra stuff).

That fundamental property of ZFS makes it impossible to have a
single block implicitly referenced by multiple trees, unless the
block happens to be part of an explicit snapshot of the same object
at some later point in time.

Thus, when there's a need to modify an existing object, ZFS never
touches the old blocks. It build a tree of blocks, *explicitly*
reusing those blocks that haven't changed. When it is done building
the new tree the old one is still the active one. The last transaction
that happens updates an uberblock (ZFS speak for VtRoot) in an atomic
fashion, thus making a new tree an active one. The old tree is still
around at that point and if it is not part of a snapshot it can be
"garbage collected" and the blocks can be freed if it is part of the
snapshot -- it is preserved. In the later case the behavior seems
to be exactly what venti does

But even in the former case I don't see how the corruption could be
possible. Please elaborate.

Thanks,
Roman.