From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Thu, 29 May 2008 11:12:25 +0200 From: Enrico Weigelt To: 9fans@9fans.net Message-ID: <20080529091225.GA1617@nibiru.local> References: <20080526125857.GB10890@nibiru.local> <78feb60ec33f8a38ccbc38625b6ea653@quanstro.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <78feb60ec33f8a38ccbc38625b6ea653@quanstro.net> User-Agent: Mutt/1.4.1i Subject: Re: [9fans] Fossil+Venti on Linux Topicbox-Message-UUID: aef5f188-ead3-11e9-9d60-3106f5b1d025 * erik quanstrom wrote: > > As a more sophisticated aproach, I'm planning an *real* clustered > > venti, which also keeps track of block atime's and copy-counters. > > This way, seldomly used blocks can be removed from one node as long > > as there are still enough copies in the cluster. (probably requires > > redesign of the log architecture). > > one of venti's design goals was to structure the arenas so that filled > arenas are immutable. this is important for recoverability. if you > know the arena was filled and thus has not changed, any backup > will do. > put simply, venti trades the ability to delete for reliability. Right, my approach would be an paradigm change. But my venti-2 would used for completely different things: distributed data storage instead of eternal log ;P > since storage is very cheep, i think this is a good tradeoff. I'm thinking of an scale where storage isn't that cheap ... > > This still isn't an replicated/distributed fs, but an clustered > > block storage, maybe even as a basis for an truely replicated fs. > > BTW: with a bit more logic, we even could build something like > > Amazon's S3 on that ;-) > > what problem are you trying to solve? if you are trying to go for > reliability, i would think it would be easier to use raid+backups > for data stability. Easier, yes, but more expensive (at least the iron). > consider this case. two fs want to add different files to the same > directory "at the same time". i don't see how block storage can > help you with any of the problems that arise from this case. It shouldn't, same as an RAID can't help an local fs with multiple users addings files to the same directory. In my concept, the distribution of the block storage has nothing to do with the (eventual) distribution of the fs. My venti-2 will be like a SAN, just with content-addressing :) So, instead of an SAN or local RAID you can simply use an venti-2 cloud. The venti-clients (eg. fossil, vac, ...) do not any knowledge about this fact. A venti-based distributed filesystem is an completely different issue. All nodes will store their (payload) data in one venti (-cloud). Of course the nodes have to coordinate their actions (through an separate channel), but this will only be required for metadata, not payload. Data cache coherency isn't an issue anymore, since a data block itself cannot change - only a file's data pointers, which belong to metadata, will change. For example, if only one node can write to a file (and writes don't have to appear to others simultaniously reading the same file, aka. transaction methodology ;-)), single files could be stored via vac, and the fs-cluster only has to manage directories. The directory server(s) than manage the permissions and directory updates. Each commit of a new file or file change triggers an directory update. This can be done transactionally via an RDBMS. The fine thing of this concept is, the venti cloud could even be built of hosts which aren't completely trusted (as long as data itself is properly encrypted) - as long as there are enough copies and you've got enough peerings in the cloud, single nodes can't harm your data. cu -- ---------------------------------------------------------------------- Enrico Weigelt, metux IT service -- http://www.metux.de/ cellphone: +49 174 7066481 email: info@metux.de skype: nekrad666 ---------------------------------------------------------------------- Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme ----------------------------------------------------------------------