9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] Ideas for gc on venti
@ 2008-06-18 19:35 Enrico Weigelt
  2008-06-18 20:16 ` Russ Cox
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Enrico Weigelt @ 2008-06-18 19:35 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


Hi folks,


as I'm using venti as storage backend for an media archive, where
content can be deleted (and probably will happen often enough),
I'm currently thinking about how an garbage collection could be
achived.

Let's assume the following premise:

* only a few well-known apps are writing to venti (eg. only
  vac and vtstore).
* we know all the root scores and can iterate through the
  metadata from time to time.
* venti's storage is divided in several logs of not to big size
  (eg. 2GB).

Now we introduce an "deprecated" mode for an volume: no more
writes to that volume, requested blocks are automatically moved
to another volume (and cleared from the deprecated one). Maybe
from time to time there might run an compaction process which
removes the holes in the volume.

Well, that's not yet any form of gc - just an smooth data moving
from one volume to another - also good if you intend to take some
disk offline in near future, w/o serious interruption.
(The deprecated volume get emptier and emptier, and no new
data is added.)

GC is the next step:

Assuming each block to keep is accessed at least once in some given
time, we'll know that the remaining data on the volume will be
trash after that time. So everything we've got to do is to iterate
through all archives and access all their blocks (*1). Once this
is completely done, the deferred volume only contains trash and
can be safely deleted.


What do you think about that approach ?

cu

*1) we could introduce a new "touch" rpc call, which simply tells
venti that some list of blocks is still required, but does not
send back their data.

--
----------------------------------------------------------------------
 Enrico Weigelt, metux IT service -- http://www.metux.de/

 cellphone: +49 174 7066481   email: info@metux.de   skype: nekrad666
----------------------------------------------------------------------
 Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
----------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Ideas for gc on venti
  2008-06-18 19:35 [9fans] Ideas for gc on venti Enrico Weigelt
@ 2008-06-18 20:16 ` Russ Cox
  2008-06-18 20:50   ` Enrico Weigelt
  2008-06-18 20:32 ` erik quanstrom
       [not found] ` <d39497d76feddcee629a4ea8c7af63d9@quanstro.net>
  2 siblings, 1 reply; 11+ messages in thread
From: Russ Cox @ 2008-06-18 20:16 UTC (permalink / raw)
  To: weigelt, 9fans

> What do you think about that approach ?

I think you will lose your data.

The greatest strength of venti, and also of
the worm file system, is that once data is written,
those disk blocks are never changed again.
That makes it virtually impossible to lose data
due to software or human errors.  This is no small thing.

Why not just use an ordinary file system?
What benefit are you deriving from using venti
that is making all this rewriting worthwhile?

If it's just that when two people upload the same
file, you don't store it twice, you could just store
files named by their SHA1 hashes in an ordinary
file system.

Russ



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Ideas for gc on venti
  2008-06-18 19:35 [9fans] Ideas for gc on venti Enrico Weigelt
  2008-06-18 20:16 ` Russ Cox
@ 2008-06-18 20:32 ` erik quanstrom
       [not found] ` <d39497d76feddcee629a4ea8c7af63d9@quanstro.net>
  2 siblings, 0 replies; 11+ messages in thread
From: erik quanstrom @ 2008-06-18 20:32 UTC (permalink / raw)
  To: weigelt, 9fans

> Well, that's not yet any form of gc - just an smooth data moving
> from one volume to another - also good if you intend to take some
> disk offline in near future, w/o serious interruption.
> (The deprecated volume get emptier and emptier, and no new
> data is added.)

in the original venti paper, the problems associated with disk
management, redundancy and backup were ignored so they
could be handled seperately.

i think this is good design.  but i can't take credit for this
opinion.  i've had kernighan & plauger, elements of programming
style on my desk for a few days.  this is a book old enough to give
examples in pl/1 but i think it still gives advice which bears repeating.

one of the suggestions is that each function should hide something
important.

it makes sense for the storage managment function to present an
idealized block device while hiding details like disk replacement
and redundency.

now, if i could get all my own functions to live up to this standard....

- erik




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Ideas for gc on venti
  2008-06-18 20:16 ` Russ Cox
@ 2008-06-18 20:50   ` Enrico Weigelt
  0 siblings, 0 replies; 11+ messages in thread
From: Enrico Weigelt @ 2008-06-18 20:50 UTC (permalink / raw)
  To: 9fans

* Russ Cox <rsc@swtch.com> wrote:

Hi,


> The greatest strength of venti, and also of
> the worm file system, is that once data is written,
> those disk blocks are never changed again.

Yep, but my scenario is not completely worm.
Some data might be removed/unused. Even it might not be absolutely
necessary, it would be nice to reclaim space.

> Why not just use an ordinary file system?
> What benefit are you deriving from using venti
> that is making all this rewriting worthwhile?

Venti makes lots of things easier, eg. it avoids duplicated data.
For example, if some users upload already existing media, I've
just got one more db record, but no duplicate data. Doing this
on fs basis would require more logic on application side.

Another, very important, point is that I'm creating an cloud venti,
which synchronizes with its peers on-demand and distributes the
data over the cloud. So I don't need additional logic for
clustering the application / it's data spaces.
(I'll also use the venticloud for several other things, eg. for
building an distributed fs or something like S3 on it).


cu
--
----------------------------------------------------------------------
 Enrico Weigelt, metux IT service -- http://www.metux.de/

 cellphone: +49 174 7066481   email: info@metux.de   skype: nekrad666
----------------------------------------------------------------------
 Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
----------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Ideas for gc on venti
       [not found] ` <d39497d76feddcee629a4ea8c7af63d9@quanstro.net>
@ 2008-06-18 20:57   ` Enrico Weigelt
  2008-06-18 21:29     ` Bakul Shah
  2008-06-19 12:46     ` erik quanstrom
  0 siblings, 2 replies; 11+ messages in thread
From: Enrico Weigelt @ 2008-06-18 20:57 UTC (permalink / raw)
  To: 9fans

* erik quanstrom <quanstro@quanstro.net> wrote:

> it makes sense for the storage managment function to present an
> idealized block device while hiding details like disk replacement
> and redundency.

Well, I intend to make venti the storage device itself
(eg. in form on an hw appliance ;-P). At this point an special
venti could make hw RAID obsolete and also do things like bad
block handling.

RAID has some disadvantages, eg. you have to nail-down partition
sizes and it's not trivial to resize or move around volumes.
A venti-based system (which maybe presents an block device via
venti) can make runtime configuration much easier.


cu
--
----------------------------------------------------------------------
 Enrico Weigelt, metux IT service -- http://www.metux.de/

 cellphone: +49 174 7066481   email: info@metux.de   skype: nekrad666
----------------------------------------------------------------------
 Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
----------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Ideas for gc on venti
  2008-06-18 20:57   ` Enrico Weigelt
@ 2008-06-18 21:29     ` Bakul Shah
  2008-06-18 22:37       ` Skip Tavakkolian
  2008-06-19 12:46     ` erik quanstrom
  1 sibling, 1 reply; 11+ messages in thread
From: Bakul Shah @ 2008-06-18 21:29 UTC (permalink / raw)
  To: weigelt, Fans of the OS Plan 9 from Bell Labs

On Wed, 18 Jun 2008 22:57:27 +0200 Enrico Weigelt <weigelt@metux.de>  wrote:
> * erik quanstrom <quanstro@quanstro.net> wrote:
>
> > it makes sense for the storage managment function to present an
> > idealized block device while hiding details like disk replacement
> > and redundency.
>
> Well, I intend to make venti the storage device itself
> (eg. in form on an hw appliance ;-P). At this point an special
> venti could make hw RAID obsolete and also do things like bad
> block handling.
>
> RAID has some disadvantages, eg. you have to nail-down partition
> sizes and it's not trivial to resize or move around volumes.
> A venti-based system (which maybe presents an block device via
> venti) can make runtime configuration much easier.

Have you looked at zfs (on solaris, freebsd or macos)?  It
seems to offer most of what you are looking for.

As for venti, you can use something like venti/copy to copy a
subset of trees to a new venti and then reuse all of the old
venti space. This is exactly like a copying GC (only "live
data" is copied). But why bother.  For one thing you can't do
selective file copying without a lot of extra hassle.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Ideas for gc on venti
  2008-06-18 21:29     ` Bakul Shah
@ 2008-06-18 22:37       ` Skip Tavakkolian
  2008-06-18 22:54         ` erik quanstrom
  0 siblings, 1 reply; 11+ messages in thread
From: Skip Tavakkolian @ 2008-06-18 22:37 UTC (permalink / raw)
  To: 9fans

one legitimate reason is the liability of keeping a user's data
long after any business arrangements for storing such data has
expired. this applies to kenfs too.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Ideas for gc on venti
  2008-06-18 22:37       ` Skip Tavakkolian
@ 2008-06-18 22:54         ` erik quanstrom
  0 siblings, 0 replies; 11+ messages in thread
From: erik quanstrom @ 2008-06-18 22:54 UTC (permalink / raw)
  To: 9fans

> one legitimate reason is the liability of keeping a user's data
> long after any business arrangements for storing such data has
> expired. this applies to kenfs too.

this is a good point.

are there any fs that have mechanisms to help
apply data retention policy?  if one does offline
backup, deleting only the stuff that needs to
be forgotten can be quite painful.

suppose (as a weak example) the labs' main worm
were subject to the normal business data retention
rules.  there would be a lot of history lost.

it's in the forgetting that memory is made useful.

- erik




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Ideas for gc on venti
  2008-06-18 20:57   ` Enrico Weigelt
  2008-06-18 21:29     ` Bakul Shah
@ 2008-06-19 12:46     ` erik quanstrom
  2008-06-19 14:20       ` a
  1 sibling, 1 reply; 11+ messages in thread
From: erik quanstrom @ 2008-06-19 12:46 UTC (permalink / raw)
  To: weigelt, 9fans

> RAID has some disadvantages, eg. you have to nail-down partition
> sizes and it's not trivial to resize or move around volumes.

you seem to be making a general claim about all storage
management solutions that i don't think can be backed
up.

as an example i have no rooting interest in, way back
in 1996, i was able to use aix lvm to migrate a couple
of hundred filesystems in many tens of vgs to tens of
filesystems on a handful of vgs with mirrored lvs.  i
didn't find it hard at all to reallocate or resize
anything.  there were no partitions in sight.

(i sure don't miss dasd.)

> A venti-based system (which maybe presents an block device via
> venti) can make runtime configuration much easier.

combining functionality that is logically distinct is
generally called unmodular, and a layering violation
in this particular senerio.

- erik




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Ideas for gc on venti
  2008-06-19 12:46     ` erik quanstrom
@ 2008-06-19 14:20       ` a
  2008-06-19 14:33         ` erik quanstrom
  0 siblings, 1 reply; 11+ messages in thread
From: a @ 2008-06-19 14:20 UTC (permalink / raw)
  To: 9fans

// combining functionality that is logically distinct is
// generally called unmodular, and a layering violation
// in this particular senerio.

i agree with the principle, but i'm not sure it applies in this
case. what's described (at least the part before any "garbage"
collection is done) is really just arena management, not disk
management. the arenas are all defined within venti, and
nothing underneath really has any understanding of how (or
if) they're being used. i don't think there's anything
conceptually wrong with asking venti to be able to manage
which arenas are "live" or not.

of course, i think the specific "deprecated" suggestion is
predicated on the idea that you're going to periodically scan
the entire data log, which doesn't seem like an assumption
that's going to scale all that well (especially in light of the
stated goal of eventual distribution).

and this is certainly not a defense of the garbage collection
idea. i'd be quite averse to any form of automated garbage
collection in venti. i've got a few scores written down in a
notebook which aren't in any root and don't duplicate
blocks in any fs (unless by accident).

it would be nice to be able to selectively & manually mark a
given score as "deprecated" and have any blocks only
associated with that score freed (i've got a few hundred MB
already "wasted" on my venti based on having put a space
in a vac command line in the wrong place, for example), but
i find russ' point about the code to touch written blocks
being entirely bug-free based on not existing to be pretty
darn compelling. that level of safety is worth a lot.

anthony

ps: what'd make me give up on the deletion idea entirely
is some form of authentication in venti, even if it's just
allowing fossil to connect to it via tls using certificates. i
can deal with my own mistakes, but it does make me
slightly uncomfortable being open to DoS attacks.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Ideas for gc on venti
  2008-06-19 14:20       ` a
@ 2008-06-19 14:33         ` erik quanstrom
  0 siblings, 0 replies; 11+ messages in thread
From: erik quanstrom @ 2008-06-19 14:33 UTC (permalink / raw)
  To: 9fans

> // combining functionality that is logically distinct is
> // generally called unmodular, and a layering violation
> // in this particular senerio.
>
> i agree with the principle, but i'm not sure it applies in this
> case. what's described (at least the part before any "garbage"
> collection is done) is really just arena management, not disk
> management. the arenas are all defined within venti, and
> nothing underneath really has any understanding of how (or
> if) they're being used. i don't think there's anything
> conceptually wrong with asking venti to be able to manage
> which arenas are "live" or not.

the case given was that a disk needed replacing.  i can run
your argument the other way and say that venti doesn't care
which disk goes where or how the storage itself is organized.
one should be able to replace a failed drive without involving
venti.

slightly off topic.  we use this to our advantage at coraid,
though we are not using venti.  our mail fs uses aoe storage.
there are not a lot of people expert in the adminstration of
our fs, but there are many people who can repair a degraded
raid or perform other storage administration.  this requires no
knowledge of the fs.

when you're on call 24/7/365 for fs problems, this is a
wonderful thing.

- erik




^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-06-19 14:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-18 19:35 [9fans] Ideas for gc on venti Enrico Weigelt
2008-06-18 20:16 ` Russ Cox
2008-06-18 20:50   ` Enrico Weigelt
2008-06-18 20:32 ` erik quanstrom
     [not found] ` <d39497d76feddcee629a4ea8c7af63d9@quanstro.net>
2008-06-18 20:57   ` Enrico Weigelt
2008-06-18 21:29     ` Bakul Shah
2008-06-18 22:37       ` Skip Tavakkolian
2008-06-18 22:54         ` erik quanstrom
2008-06-19 12:46     ` erik quanstrom
2008-06-19 14:20       ` a
2008-06-19 14:33         ` erik quanstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).