On Thu, Feb 12, 2009 at 07:49:58AM -0500, erik quanstrom wrote: > exactly. the point i was trying to make, and evidently > was being too coy about, is that 330 odd gb wouldn't > be as useful a number as the sum of the sizes of all the > new/changed files from all the dump days. this would > be a useful comparison because this would give a > measure of how much space is saved with venti over > the straightforward algorithm of copying the changed > blocks, as ken's fileserver does. Unless I misunderstand how replica works, the 330 odd GB number [1] is useful as the amount of data that would have to be transfered over the wire to initialize a mirror. (Since, as I understand it, a replica log of sourcesdump would have nothing but "add" commands for each $year/$dump/$file entry, and would therefore necessitate transfering each file separately). On the other hand, it's entirely possible that I'm missing some feature of replica, or that some set of wrapper scripts around it would suffice. If so, please excuse, and correct, my ignorance. On the first hand again, given the occasional reports of "replica hosed me" I'm not terribly keen on trusting it and seem to recall that some of the fixes have involved hand-editing the replica logs on sources. This makes me suspicious that some of the replica logs frozen in sourcesdump would be incorrect and lead to incorrect data on mirrors if used as part of the scheme. With a venti & vac (auth/none vac, naturally, so as to not violate filesystem permissions) based mirror, there's a single score published daily that covers the entirety of sourcesdump so far, and a venti/copy -f sufficies to bring any mirror up to date using at most 550 odd MB if the initial mirror is empty. [2] --nwf; [1] The discrepency between 550 MB and 330 GB increases as time goes on and as the slice of sources being mirrored goes from "just some source files that some schmo thought would be nice to mirror" to "all of it". [2] Further, 9fs access to sources is grand, but it does take me 10 to 15 minutes to pull down a day's worth of "just some source files", even if nothing has changed and I uses vac -f, due to all the network latency for Tstat/Rstat requests. This could be improved in a number of ways, but it strikes me as simpler to use venti/copy to copy only the incremental deltas. Some brief experiments, transfering blocks from Baltimore back to a machine in the same neighborhood as sources, indicate that venti/copy -f takes 15 minutes for the first copy (2002/1212) and that subsequently copying even a dump with many changes (2008/0901) took only four. (Git may do even better.)