9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: "Roman V. Shaposhnik" <rvs@sun.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Subject: Re: [9fans] Changelogs & Patches?
Date: Wed,  7 Jan 2009 16:36:52 -0800	[thread overview]
Message-ID: <1231375012.5141.205.camel@goose.sun.com> (raw)
In-Reply-To: <adbe04b607132eb0669582417a7ec312@quanstro.net>

On Tue, 2009-01-06 at 18:44 -0500, erik quanstrom wrote:
> >> a big difference between the decisions is in data integrety.
> >> it's much easier to break a fs that rewrites than it is a
> >> worm-based fs.
> >
> > True. But there's a grey area here: an FS that *never* rewrites
> > live blocks, but can reclaim dead ones. That's essentially
> > what ZFS does.
>
> unfortunately, i would think that can result in data loss since
> i can can no longer take a set of copies of the fs {fs_0, ... fs_n}
> and create a new copy with all the data possibly recovered
> by picking a set "good" blocks from the fs_i, since i can make
> a block dead by removing the file using it and i can make it
> live again by writing a new file.
>
> perhaps i've misinterpreted what you are saying?

Lets see. May be its my misinterpretation of what venti does. But so
far I understand that it boils down to: I give venti a block of any
length, it gives me a score back. Now internally, venti might decide
to split that huge block into a series of smaller ones and store it
as a tree. But still all I get back is a single score. I don't care
whether that score really describes my raw data block, or a block full
of scores that actually describe raw data. All I care is that when
I give venti that score back -- it'll reconstruct the data. I also
have a guarantee that the data will never ever be deleted.

Now, because of that guarantee (blocks are never deleted) and since
all blocks bigger than 56k get split venti has a nice property of
reusing blocks from existing trees. This happens as a by-product
of the design: I ask venti to store a block and if that same block
was already there -- there will be an extra arrow pointing at it.
All in all -- very compact way of representing a forest of trees.
Each tree corresponds to a VtEntry data structure and blocks full
of VtEntry structures are called VtEntryDir's. Finally a root
VtEntryDir is pointed at by VtRoot structure.

Contrast this with ZFS, where blocks are *not* addressed via scores,
but rather with a vdev:offset pairs called DVAs. This, of course,
means that there's no block coalescing going on. You ask ZFS to store
a block it gives you a DVA back. You ask it to store the same block
again, you get a different DVA (well, actually it gives you a block
pointer which is DVA augmented by extra stuff).

That fundamental property of ZFS makes it impossible to have a
single block implicitly referenced by multiple trees, unless the
block happens to be part of an explicit snapshot of the same object
at some later point in time.

Thus, when there's a need to modify an existing object, ZFS never
touches the old blocks. It build a tree of blocks, *explicitly*
reusing those blocks that haven't changed. When it is done building
the new tree the old one is still the active one. The last transaction
that happens updates an uberblock (ZFS speak for VtRoot) in an atomic
fashion, thus making a new tree an active one. The old tree is still
around at that point and if it is not part of a snapshot it can be
"garbage collected" and the blocks can be freed if it is part of the
snapshot -- it is preserved. In the later case the behavior seems
to be exactly what venti does

But even in the former case I don't see how the corruption could be
possible. Please elaborate.

Thanks,
Roman.




  reply	other threads:[~2009-01-08  0:36 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-22 15:27 Venkatesh Srinivas
2008-12-22 15:29 ` erik quanstrom
2008-12-22 16:41 ` Charles Forsyth
2008-12-25  6:34   ` Roman Shaposhnik
2008-12-25  6:40     ` erik quanstrom
2008-12-26  4:28       ` Roman Shaposhnik
2008-12-26  4:45         ` lucio
2008-12-26  4:57         ` Anthony Sorace
2008-12-26  6:19           ` blstuart
2008-12-27  8:00           ` Roman Shaposhnik
2008-12-27 11:56             ` erik quanstrom
2008-12-30  0:31               ` Roman Shaposhnik
2008-12-30  0:57                 ` erik quanstrom
2009-01-05  5:19                   ` Roman V. Shaposhnik
2009-01-05  5:28                     ` erik quanstrom
2008-12-22 17:03 ` Devon H. O'Dell
2008-12-23  4:31   ` Uriel
2008-12-23  4:46 ` Nathaniel W Filardo
2008-12-25  6:50   ` Roman Shaposhnik
2008-12-25 14:37     ` erik quanstrom
2008-12-26 13:27       ` Charles Forsyth
2008-12-26 13:33         ` Charles Forsyth
2008-12-26 14:27         ` tlaronde
2008-12-26 17:25           ` blstuart
2008-12-26 18:14             ` tlaronde
2008-12-26 18:20               ` erik quanstrom
2008-12-26 18:52                 ` tlaronde
2008-12-26 21:44                   ` blstuart
2008-12-26 22:04                     ` Eris Discordia
2008-12-26 22:30                       ` erik quanstrom
2008-12-26 23:00                         ` blstuart
2008-12-27  6:04                         ` Eris Discordia
2008-12-27 10:36                           ` tlaronde
2008-12-27 16:27                             ` Eris Discordia
2008-12-29 23:54         ` Roman Shaposhnik
2008-12-30  0:13           ` hiro
2008-12-30  1:07           ` erik quanstrom
2008-12-30  1:48           ` Charles Forsyth
2008-12-30 13:18             ` Uriel
2008-12-30 15:06               ` C H Forsyth
2008-12-30 17:31                 ` Uriel
2008-12-31  1:58                   ` Noah Evans
2009-01-03 22:03           ` sqweek
2009-01-05  5:05             ` Roman V. Shaposhnik
2009-01-05  5:12               ` erik quanstrom
2009-01-06  5:06                 ` Roman Shaposhnik
2009-01-06 13:55                   ` erik quanstrom
2009-01-05  5:24               ` andrey mirtchovski
2009-01-06  5:49                 ` Roman Shaposhnik
2009-01-06 14:22                   ` andrey mirtchovski
2009-01-06 16:19                     ` erik quanstrom
2009-01-06 23:23                       ` Roman V. Shaposhnik
2009-01-06 23:44                         ` erik quanstrom
2009-01-08  0:36                           ` Roman V. Shaposhnik [this message]
2009-01-08  1:11                             ` erik quanstrom
2009-01-20  6:20                               ` Roman Shaposhnik
2009-01-20 14:19                                 ` erik quanstrom
2009-01-20 22:30                                   ` Roman V. Shaposhnik
2009-01-20 23:36                                     ` erik quanstrom
2009-01-21  1:43                                       ` Roman V. Shaposhnik
2009-01-21  2:02                                         ` erik quanstrom
2009-01-26  6:28                                           ` Roman V. Shaposhnik
2009-01-26 13:42                                             ` erik quanstrom
2009-01-26 16:15                                               ` Roman V. Shaposhnik
2009-01-26 16:39                                                 ` erik quanstrom
2009-01-27  4:45                                                   ` Roman Shaposhnik
2009-01-21 19:02                                         ` Uriel
2009-01-21 19:53                                           ` Steve Simon
2009-01-24  3:15                                             ` Roman V. Shaposhnik
2009-01-24  3:36                                               ` erik quanstrom
2009-01-26  6:21                                                 ` Roman V. Shaposhnik
2009-01-26 13:53                                                   ` erik quanstrom
2009-01-26 16:21                                                     ` Roman V. Shaposhnik
2009-01-26 17:37                                                       ` erik quanstrom
2009-01-27  4:51                                                         ` Roman Shaposhnik
2009-01-27  5:44                                                           ` erik quanstrom
2009-01-21 20:01                                           ` erik quanstrom
2009-01-24  3:19                                           ` Roman V. Shaposhnik
2009-01-24  3:25                                             ` erik quanstrom
2009-01-20  6:48                     ` Roman Shaposhnik
2009-01-20 14:13                       ` erik quanstrom
2009-01-20 16:19                         ` Steve Simon
2009-01-20 23:52                       ` andrey mirtchovski
2009-01-21  4:49                         ` Dave Eckhardt
2009-01-21  6:38                         ` Steve Simon
2009-01-21 14:02                           ` erik quanstrom
2009-01-26  6:16                         ` Roman V. Shaposhnik
2009-01-26 16:22                           ` Russ Cox
2009-01-26 19:42                             ` Roman V. Shaposhnik
2009-01-26 20:11                               ` Steve Simon
2008-12-27  7:40       ` Roman Shaposhnik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1231375012.5141.205.camel@goose.sun.com \
    --to=rvs@sun.com \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).