* [9fans] Fossil+Venti on Linux @ 2008-05-25 7:56 Enrico Weigelt 2008-05-25 14:59 ` a 0 siblings, 1 reply; 12+ messages in thread From: Enrico Weigelt @ 2008-05-25 7:56 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Hi folks, is anyone already running venti+fossil on Linux ? I've got several machines, where I need an remote backup (via DSL link) with a few days living snapshots. Ideally the backup should be directly accessible, so when one machine goes down a while, the backup one can take over. The venti even could be clustered (each machine feeding the others), so the remaining backup will just be on metadata, right ? cu -- --------------------------------------------------------------------- Enrico Weigelt == metux IT service - http://www.metux.de/ --------------------------------------------------------------------- Please visit the OpenSource QM Taskforce: http://wiki.metux.de/public/OpenSource_QM_Taskforce Patches / Fixes for a lot dozens of packages in dozens of versions: http://patches.metux.de/ --------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] Fossil+Venti on Linux 2008-05-25 7:56 [9fans] Fossil+Venti on Linux Enrico Weigelt @ 2008-05-25 14:59 ` a 2008-05-25 15:48 ` Francisco J Ballesteros 2008-05-26 12:58 ` Enrico Weigelt 0 siblings, 2 replies; 12+ messages in thread From: a @ 2008-05-25 14:59 UTC (permalink / raw) To: weigelt, 9fans I'm not running Linux, but I've run venti+fossil on Mac OS X for testing. I intend to use venti there regularly once I figure out how to get OS X to let me get at a raw partition that isn't mounted (anyone?). I don't think venti+fossil will do what you're looking for, however, at least not without some additional machinery. Fossil doesn't do any replication or fail-over: it must talk to zero or one ventis. Venti doesn't automatically replicate anything, either, although that's pretty easy to script if you're willing to accept the exposure of a cron job. It's true you could run multiple fossils off one venti, but they'll be logically distinct (just getting the block aggregation benifits of sharing a venti backing store). I believe the Plan B folks did some work with fail-over (amongst other things) that might be applicable. Beyond that, if you want to get what you want from venti+fossil, you'll need to inject a filter in front of one of those two to do the fail-over (and handle all the fun of tracking writes and propagating them when the server comes back, and so on). If you're looking to back up *existing* Linux boxes, then fossil might not be what you want anyway. Take a look at vbackup(8) and friends (I'm trying to convince it I'm on an HFS+ partition). You'll have to figure out the correct procedures for your site, but the examples are pretty useful. Still no automatic fail-over, but a cron job could probably get you replication. Anthony ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] Fossil+Venti on Linux 2008-05-25 14:59 ` a @ 2008-05-25 15:48 ` Francisco J Ballesteros 2008-05-25 20:24 ` erik quanstrom 2008-05-26 12:58 ` Enrico Weigelt 1 sibling, 1 reply; 12+ messages in thread From: Francisco J Ballesteros @ 2008-05-25 15:48 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs > I believe the Plan B folks did some work with fail-over (amongst other > things) that might be applicable. Beyond that, if you want to get what you You could adapt Plan B's bns to fail over between different FSs. But... We learned that although you can let he FS fail over nicely, many other things stand in the way making it unnecessary to fail over. For example, on Plan 9, cs and dns have problems after a fail over, your IP address may change, etc. All that is to say that when you experience tolerance to FS failures, you still face other things that do not fail over. To tolerate failures what we do is to run venti on a raid. If fossil gets corrupted somehow we'd just format the partition using the last vac. To survive crashes of the machine with the venti we copy its arenas to another machine, aso keeping a raid. If you want clients to stay up during server crashes you could use either bns or recover to pretend the FS is still there (blocked, but there) while you reboot (or replace) it. hth ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] Fossil+Venti on Linux 2008-05-25 15:48 ` Francisco J Ballesteros @ 2008-05-25 20:24 ` erik quanstrom 0 siblings, 0 replies; 12+ messages in thread From: erik quanstrom @ 2008-05-25 20:24 UTC (permalink / raw) To: 9fans > You could adapt Plan B's bns to fail over between different FSs. But... > We learned that although you can let he FS fail over nicely, many other > things stand in the way making it unnecessary to fail over. For example, > on Plan 9, cs and dns have problems after a fail over, your IP address > may change, etc. All that is to say that when you experience tolerance > to FS failures, you still face other things that do not fail over. > > To tolerate failures what we do is to run venti on > a raid. If fossil gets corrupted somehow we'd just format the partition > using the last vac. To survive crashes of the machine with the venti we > copy its arenas to another machine, aso keeping a raid. forgive a bit of off-topicness. this is about ken's filesystem, not venti or fossil. the coraid fs maintains its cache on a local AoE-based raid10 and it automaticlly mirrors its worm on two AoE-based raid5 targets. the secondary worm target is in a seperate building with a backup fs. since reads always start with the first target, the slow offsite link is not noticed. (we frequently exceed the bandwidth of the backup link -- now 100Mbps --- to the cache, so replicating the cache would be impractical.) we can sustain the loss of a disk drive with only a small and temporary performance hit. the storage targets may be rebooted with a small pause in service. more severe machine failues can be recovered with varing degrees of pain. only if both raid targets were lost simultainously would more than 24hrs of data be lost. we don't do any failover. we try to keep the fs up instead. we have had two unplanned fs outages in 2 years. one was due to a corrupt sector leading to a bad tag. the other was a network problem due to an electrical storm that could have been avoided if i'd been on the ball. the "diskless fileserver" paper from iwp9 has the gory details. - erik ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] Fossil+Venti on Linux 2008-05-25 14:59 ` a 2008-05-25 15:48 ` Francisco J Ballesteros @ 2008-05-26 12:58 ` Enrico Weigelt 2008-05-26 14:01 ` erik quanstrom [not found] ` <78feb60ec33f8a38ccbc38625b6ea653@quanstro.net> 1 sibling, 2 replies; 12+ messages in thread From: Enrico Weigelt @ 2008-05-26 12:58 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs * a@9srv.net <a@9srv.net> wrote: > I don't think venti+fossil will do what you're looking for, however, at least > not without some additional machinery. Fossil doesn't do any replication or > fail-over: it must talk to zero or one ventis. Venti doesn't automatically > replicate anything, either, although that's pretty easy to script if you're > willing to accept the exposure of a cron job. I intend to code an special cluster-venti, which automatically propagates new blocks to it's peers and ask around if it can't some block. This would bit be a bit lazy replication (depends on how long propagation takes). As long as the critical data (metadata, ...) are distributed fast enough or otherwise backed-up properly, so at least an reasonably recent snapshot can always be retrieved, this IMHO would make it much easier/faster to get some services up on another machine. Doesn't need to have a true failover fs, I just want to start with the last snapshot quickly (as I now would do with a tar'ed backup). My first excercise will be some LOC which dump out the new blocks to some dir and build some separate daemon which sends them out to the peer venti's. So I only have to back-up fossil's local metadata. Since the machines tend to have a lot of equal data, much space and traffic can be saved. As a more sophisticated aproach, I'm planning an *real* clustered venti, which also keeps track of block atime's and copy-counters. This way, seldomly used blocks can be removed from one node as long as there are still enough copies in the cluster. (probably requires redesign of the log architecture). This still isn't an replicated/distributed fs, but an clustered block storage, maybe even as a basis for an truely replicated fs. BTW: with a bit more logic, we even could build something like Amazon's S3 on that ;-) Actually, I don't need an replicated fs at all, just an space and traffic efficient backup mechanism, which shares data with the local fs'es. cu -- --------------------------------------------------------------------- Enrico Weigelt == metux IT service - http://www.metux.de/ --------------------------------------------------------------------- Please visit the OpenSource QM Taskforce: http://wiki.metux.de/public/OpenSource_QM_Taskforce Patches / Fixes for a lot dozens of packages in dozens of versions: http://patches.metux.de/ --------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] Fossil+Venti on Linux 2008-05-26 12:58 ` Enrico Weigelt @ 2008-05-26 14:01 ` erik quanstrom [not found] ` <78feb60ec33f8a38ccbc38625b6ea653@quanstro.net> 1 sibling, 0 replies; 12+ messages in thread From: erik quanstrom @ 2008-05-26 14:01 UTC (permalink / raw) To: weigelt, 9fans > As a more sophisticated aproach, I'm planning an *real* clustered > venti, which also keeps track of block atime's and copy-counters. > This way, seldomly used blocks can be removed from one node as long > as there are still enough copies in the cluster. (probably requires > redesign of the log architecture). one of venti's design goals was to structure the arenas so that filled arenas are immutable. this is important for recoverability. if you know the arena was filled and thus has not changed, any backup will do. put simply, venti trades the ability to delete for reliability. since storage is very cheep, i think this is a good tradeoff. > This still isn't an replicated/distributed fs, but an clustered > block storage, maybe even as a basis for an truely replicated fs. > BTW: with a bit more logic, we even could build something like > Amazon's S3 on that ;-) what problem are you trying to solve? if you are trying to go for reliability, i would think it would be easier to use raid+backups for data stability. using a ups will do wonders for uptime. if you're going to use a distributed block storage device to build a distributed fs, then either your fs can't do any caching at all or there needs to be a full cache coherency protocol between the fs and the block storage. consider this case. two fs want to add different files to the same directory "at the same time". i don't see how block storage can help you with any of the problems that arise from this case. - erik ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <78feb60ec33f8a38ccbc38625b6ea653@quanstro.net>]
* Re: [9fans] Fossil+Venti on Linux [not found] ` <78feb60ec33f8a38ccbc38625b6ea653@quanstro.net> @ 2008-05-29 9:12 ` Enrico Weigelt 2008-05-29 9:27 ` Christian Kellermann 2008-05-29 12:26 ` erik quanstrom 0 siblings, 2 replies; 12+ messages in thread From: Enrico Weigelt @ 2008-05-29 9:12 UTC (permalink / raw) To: 9fans * erik quanstrom <quanstro@quanstro.net> wrote: > > As a more sophisticated aproach, I'm planning an *real* clustered > > venti, which also keeps track of block atime's and copy-counters. > > This way, seldomly used blocks can be removed from one node as long > > as there are still enough copies in the cluster. (probably requires > > redesign of the log architecture). > > one of venti's design goals was to structure the arenas so that filled > arenas are immutable. this is important for recoverability. if you > know the arena was filled and thus has not changed, any backup > will do. > put simply, venti trades the ability to delete for reliability. Right, my approach would be an paradigm change. But my venti-2 would used for completely different things: distributed data storage instead of eternal log ;P > since storage is very cheep, i think this is a good tradeoff. I'm thinking of an scale where storage isn't that cheap ... > > This still isn't an replicated/distributed fs, but an clustered > > block storage, maybe even as a basis for an truely replicated fs. > > BTW: with a bit more logic, we even could build something like > > Amazon's S3 on that ;-) > > what problem are you trying to solve? if you are trying to go for > reliability, i would think it would be easier to use raid+backups > for data stability. Easier, yes, but more expensive (at least the iron). > consider this case. two fs want to add different files to the same > directory "at the same time". i don't see how block storage can > help you with any of the problems that arise from this case. It shouldn't, same as an RAID can't help an local fs with multiple users addings files to the same directory. In my concept, the distribution of the block storage has nothing to do with the (eventual) distribution of the fs. My venti-2 will be like a SAN, just with content-addressing :) So, instead of an SAN or local RAID you can simply use an venti-2 cloud. The venti-clients (eg. fossil, vac, ...) do not any knowledge about this fact. A venti-based distributed filesystem is an completely different issue. All nodes will store their (payload) data in one venti (-cloud). Of course the nodes have to coordinate their actions (through an separate channel), but this will only be required for metadata, not payload. Data cache coherency isn't an issue anymore, since a data block itself cannot change - only a file's data pointers, which belong to metadata, will change. For example, if only one node can write to a file (and writes don't have to appear to others simultaniously reading the same file, aka. transaction methodology ;-)), single files could be stored via vac, and the fs-cluster only has to manage directories. The directory server(s) than manage the permissions and directory updates. Each commit of a new file or file change triggers an directory update. This can be done transactionally via an RDBMS. The fine thing of this concept is, the venti cloud could even be built of hosts which aren't completely trusted (as long as data itself is properly encrypted) - as long as there are enough copies and you've got enough peerings in the cloud, single nodes can't harm your data. cu -- ---------------------------------------------------------------------- Enrico Weigelt, metux IT service -- http://www.metux.de/ cellphone: +49 174 7066481 email: info@metux.de skype: nekrad666 ---------------------------------------------------------------------- Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme ---------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] Fossil+Venti on Linux 2008-05-29 9:12 ` Enrico Weigelt @ 2008-05-29 9:27 ` Christian Kellermann 2008-05-29 12:17 ` Enrico Weigelt 2008-05-29 12:26 ` erik quanstrom 1 sibling, 1 reply; 12+ messages in thread From: Christian Kellermann @ 2008-05-29 9:27 UTC (permalink / raw) To: weigelt, Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 345 bytes --] IIRC Russ et al. have written a paper on connecting a venti server to a distributed hash table (like chord) I think the word to google for would be venti and dhash. http://project-iris.net/isw-2003/papers/sit.pdf HTH Christian -- You may use my gpg key for replies: pub 1024D/47F79788 2005/02/02 Christian Kellermann (C-Keen) [-- Attachment #2: Type: application/pgp-signature, Size: 194 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] Fossil+Venti on Linux 2008-05-29 9:27 ` Christian Kellermann @ 2008-05-29 12:17 ` Enrico Weigelt 2008-05-29 13:51 ` Russ Cox 0 siblings, 1 reply; 12+ messages in thread From: Enrico Weigelt @ 2008-05-29 12:17 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs * Christian Kellermann <Christian.Kellermann@nefkom.net> wrote: > IIRC Russ et al. have written a paper on connecting a venti server > to a distributed hash table (like chord) I think the word to google > for would be venti and dhash. > > http://project-iris.net/isw-2003/papers/sit.pdf Sounds very intersting. Is there any source code available ? cu -- --------------------------------------------------------------------- Enrico Weigelt == metux IT service - http://www.metux.de/ --------------------------------------------------------------------- Please visit the OpenSource QM Taskforce: http://wiki.metux.de/public/OpenSource_QM_Taskforce Patches / Fixes for a lot dozens of packages in dozens of versions: http://patches.metux.de/ --------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] Fossil+Venti on Linux 2008-05-29 12:17 ` Enrico Weigelt @ 2008-05-29 13:51 ` Russ Cox 0 siblings, 0 replies; 12+ messages in thread From: Russ Cox @ 2008-05-29 13:51 UTC (permalink / raw) To: weigelt, 9fans >> http://project-iris.net/isw-2003/papers/sit.pdf > > Sounds very intersting. > Is there any source code available ? Most of what is described in that paper is now libventi, vbackup, and vnfs. There was some notion that it would be interesting to try storing data in a peer-to-peer storage system, but when push came to shove we just set up a well-equipped Venti server for our own backups. It's got 15TB of raw storage providing about 7TB of venti arenas (mirrored). The only unreleased piece is a tiny protocol translator I wrote to convert between Venti protocol and DHash protocol. DHash was and still is a research prototype. You don't want to trust your data to it. Russ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] Fossil+Venti on Linux 2008-05-29 9:12 ` Enrico Weigelt 2008-05-29 9:27 ` Christian Kellermann @ 2008-05-29 12:26 ` erik quanstrom 2008-05-29 13:33 ` Wes Kussmaul 1 sibling, 1 reply; 12+ messages in thread From: erik quanstrom @ 2008-05-29 12:26 UTC (permalink / raw) To: weigelt, 9fans >> since storage is very cheep, i think this is a good tradeoff. > > I'm thinking of an scale where storage isn't that cheap ... what scale is that? >> what problem are you trying to solve? if you are trying to go for >> reliability, i would think it would be easier to use raid+backups >> for data stability. > > Easier, yes, but more expensive (at least the iron). not sure what you mean by this. suppose i have 10TB to keep in a redundant fashion. with a two machine solution, i need 20TB of disk since the only sensible way to keep a redundant copy on a second machine is a full mirror. with a 1 machine solution, i don't need any more disks to have a full mirror and i have the option of raid5 which will reduce the number of disks i need to 10TB + 1 disk. since your model is that the storage is a significant expense, a single raid5 machine would make more sense. even if you are thinking of an enormous cloud with hundreds of machines, you could halve the number of machines required by raiding each node. if cost is an issue, reducing the number of machines is a benefit. given constant data, fewer machines reduces the obvious -- power, chassis, etc. but another important reduction is network ports. once you outgrow a single 24-port switch, network costs seem to grow in a super-linear fashion. - erik ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] Fossil+Venti on Linux 2008-05-29 12:26 ` erik quanstrom @ 2008-05-29 13:33 ` Wes Kussmaul 0 siblings, 0 replies; 12+ messages in thread From: Wes Kussmaul @ 2008-05-29 13:33 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs erik quanstrom wrote: > with > a 1 machine solution, i don't need any more disks to have a full > mirror and i have the option of raid5 which will reduce the number > of disks i need to 10TB + 1 disk. since your model is that the > storage is a significant expense, a single raid5 machine would make > more sense. Reliance on striping for redundancy frightens a number of us perhaps uninformed folks. It just seems like too much could go wrong with such a complex scheme. We sleep better knowing there's a mirrored drive in another location. As for cost, we just imagine it's 2003, a gigabyte costs five bucks, but our astute purchasing skills got storage for less than a tenth of that... ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2008-05-29 13:51 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-05-25 7:56 [9fans] Fossil+Venti on Linux Enrico Weigelt 2008-05-25 14:59 ` a 2008-05-25 15:48 ` Francisco J Ballesteros 2008-05-25 20:24 ` erik quanstrom 2008-05-26 12:58 ` Enrico Weigelt 2008-05-26 14:01 ` erik quanstrom [not found] ` <78feb60ec33f8a38ccbc38625b6ea653@quanstro.net> 2008-05-29 9:12 ` Enrico Weigelt 2008-05-29 9:27 ` Christian Kellermann 2008-05-29 12:17 ` Enrico Weigelt 2008-05-29 13:51 ` Russ Cox 2008-05-29 12:26 ` erik quanstrom 2008-05-29 13:33 ` Wes Kussmaul
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).