From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Wed, 7 Jan 2009 16:36:52 -0800 From: "Roman V. Shaposhnik" In-reply-to: To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Message-id: <1231375012.5141.205.camel@goose.sun.com> MIME-version: 1.0 Content-type: text/plain Content-transfer-encoding: 7BIT References: Subject: Re: [9fans] Changelogs & Patches? Topicbox-Message-UUID: 7bce2626-ead4-11e9-9d60-3106f5b1d025 On Tue, 2009-01-06 at 18:44 -0500, erik quanstrom wrote: > >> a big difference between the decisions is in data integrety. > >> it's much easier to break a fs that rewrites than it is a > >> worm-based fs. > > > > True. But there's a grey area here: an FS that *never* rewrites > > live blocks, but can reclaim dead ones. That's essentially > > what ZFS does. > > unfortunately, i would think that can result in data loss since > i can can no longer take a set of copies of the fs {fs_0, ... fs_n} > and create a new copy with all the data possibly recovered > by picking a set "good" blocks from the fs_i, since i can make > a block dead by removing the file using it and i can make it > live again by writing a new file. > > perhaps i've misinterpreted what you are saying? Lets see. May be its my misinterpretation of what venti does. But so far I understand that it boils down to: I give venti a block of any length, it gives me a score back. Now internally, venti might decide to split that huge block into a series of smaller ones and store it as a tree. But still all I get back is a single score. I don't care whether that score really describes my raw data block, or a block full of scores that actually describe raw data. All I care is that when I give venti that score back -- it'll reconstruct the data. I also have a guarantee that the data will never ever be deleted. Now, because of that guarantee (blocks are never deleted) and since all blocks bigger than 56k get split venti has a nice property of reusing blocks from existing trees. This happens as a by-product of the design: I ask venti to store a block and if that same block was already there -- there will be an extra arrow pointing at it. All in all -- very compact way of representing a forest of trees. Each tree corresponds to a VtEntry data structure and blocks full of VtEntry structures are called VtEntryDir's. Finally a root VtEntryDir is pointed at by VtRoot structure. Contrast this with ZFS, where blocks are *not* addressed via scores, but rather with a vdev:offset pairs called DVAs. This, of course, means that there's no block coalescing going on. You ask ZFS to store a block it gives you a DVA back. You ask it to store the same block again, you get a different DVA (well, actually it gives you a block pointer which is DVA augmented by extra stuff). That fundamental property of ZFS makes it impossible to have a single block implicitly referenced by multiple trees, unless the block happens to be part of an explicit snapshot of the same object at some later point in time. Thus, when there's a need to modify an existing object, ZFS never touches the old blocks. It build a tree of blocks, *explicitly* reusing those blocks that haven't changed. When it is done building the new tree the old one is still the active one. The last transaction that happens updates an uberblock (ZFS speak for VtRoot) in an atomic fashion, thus making a new tree an active one. The old tree is still around at that point and if it is not part of a snapshot it can be "garbage collected" and the blocks can be freed if it is part of the snapshot -- it is preserved. In the later case the behavior seems to be exactly what venti does But even in the former case I don't see how the corruption could be possible. Please elaborate. Thanks, Roman.