From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Tue, 10 Feb 2009 20:43:02 -0500 From: Nathaniel W Filardo To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Message-ID: <20090211014302.GP22259@masters6.cs.jhu.edu> References: <1234305943.4957.189.camel@goose.sun.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="THYEXwetZJOK3OLY" Content-Disposition: inline In-Reply-To: <1234305943.4957.189.camel@goose.sun.com> User-Agent: Mutt/1.5.18 (2008-05-17) Subject: [9fans] Plan 9 source history (was: Re: source browsing via http is back) Topicbox-Message-UUID: 9e5056a6-ead4-11e9-9d60-3106f5b1d025 --THYEXwetZJOK3OLY Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Feb 10, 2009 at 02:45:43PM -0800, Roman V. Shaposhnik wrote: > On Tue, 2009-02-10 at 17:28 -0500, erik quanstrom wrote: > > what leads you to beleve that that amount of sharing will be > > significant? >=20 > Just a hunch so far. I don't have hard data to prove anything. > On the other hand, I'd be surprised if massive updates (not pulling > in a couple of months) didn't benefit from the sharing. >=20 > Thanks, > Roman. I have mirrored, with vac -f, every sources dump from 2002 to yesterday with=20 -e acme/acid/386 -e acme/acid/alpha -e acme/acid/arm \ -e acme/acid/mips -e acme/acid/power -e acme/bin/386 \ -e acme/bin/alpha -e acme/bin/arm -e acme/bin/mips \ -e acme/bin/power -e acme/mail/386 -e acme/mail/alpha \ -e acme/mail/arm -e acme/mail/mips -e acme/mail/power \ -e sys/man/vol1.ps -e sys/man/vol1.ps.gz -e sys/man/vol1.pdf \ LICENSE* NOTICE acme lib rc sys ; intending to get all the source and not the binaries. I patched my vac to ignore atimes (replacing the vac metadata field with the mtime) to increase metadata block sharing. As of 2009/0205 (a convenient snapshot to du), this represents about 140.7 MB of data per dump. The entire copy takes 550 MB (240 MB actual storage in Venti). (With no sharing whatsoever, this would be approx. 310 GB.) I would like to re-archive this with the Rabin fingerprinting vac for comparison. (In case anybody wants to rush out and recreate the results, it took roughly 10 to 15 minutes per dump to dispatch all the Tstat requests to sources.) Incidentally, a git repository of the crawls, from 2002/1212 to 2009/0205, is available at http://mirrors.acm.jhu.edu/trees/plan9native/ . Git gets the data down to 165M after a gc run, so perhaps it's a better idea than a venti-based mirror. I haven't managed to make my version of Uriel's port (thanks for the start! :) ) of git do the right thing in enough cases yet, so the git repo may not be updated for a while, but I figured somebody might want to play with it in the interim. --nwf; --THYEXwetZJOK3OLY Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkmSLSUACgkQTeQabvr9Tc/GbwCfZwQ11xQYK4xhFajhsrtSFa1E ntMAn2mQfPSg1VqP/lRUuu9JMgFnG+8x =vPQi -----END PGP SIGNATURE----- --THYEXwetZJOK3OLY--