9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: Nathaniel W Filardo <nwf@cs.jhu.edu>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Subject: [9fans] Plan 9 source history (was: Re: source browsing via http	is back)
Date: Tue, 10 Feb 2009 20:43:02 -0500	[thread overview]
Message-ID: <20090211014302.GP22259@masters6.cs.jhu.edu> (raw)
In-Reply-To: <1234305943.4957.189.camel@goose.sun.com>

[-- Attachment #1: Type: text/plain, Size: 2091 bytes --]

On Tue, Feb 10, 2009 at 02:45:43PM -0800, Roman V. Shaposhnik wrote:
> On Tue, 2009-02-10 at 17:28 -0500, erik quanstrom wrote:
> > what leads you to beleve that that amount of sharing will be
> > significant?
> 
> Just a hunch so far. I don't have hard data to prove anything.
> On the other hand, I'd be surprised if massive updates (not pulling
> in a couple of months) didn't benefit from the sharing.
> 
> Thanks,
> Roman.

I have mirrored, with vac -f, every sources dump from 2002 to
yesterday with 
      -e acme/acid/386 -e acme/acid/alpha -e acme/acid/arm \
      -e acme/acid/mips -e acme/acid/power -e acme/bin/386 \
      -e acme/bin/alpha -e acme/bin/arm -e acme/bin/mips \
      -e acme/bin/power -e acme/mail/386 -e acme/mail/alpha \
      -e acme/mail/arm -e acme/mail/mips -e acme/mail/power \
      -e sys/man/vol1.ps -e sys/man/vol1.ps.gz -e sys/man/vol1.pdf \
      LICENSE* NOTICE acme lib rc sys ;
intending to get all the source and not the binaries.  I patched my vac to
ignore atimes (replacing the vac metadata field with the mtime) to increase
metadata block sharing.  As of 2009/0205 (a convenient snapshot to du), this
represents about 140.7 MB of data per dump.  The entire copy takes 550 MB
(240 MB actual storage in Venti).  (With no sharing whatsoever, this would
be approx. 310 GB.)  I would like to re-archive this with the Rabin
fingerprinting vac for comparison.

(In case anybody wants to rush out and recreate the results, it took
roughly 10 to 15 minutes per dump to dispatch all the Tstat requests to
sources.)

Incidentally, a git repository of the crawls, from 2002/1212 to 2009/0205,
is available at http://mirrors.acm.jhu.edu/trees/plan9native/ .  Git gets
the data down to 165M after a gc run, so perhaps it's a better idea than a
venti-based mirror.  I haven't managed to make my version of Uriel's port
(thanks for the start! :) ) of git do the right thing in enough cases yet,
so the git repo may not be updated for a while, but I figured somebody might
want to play with it in the interim.

--nwf;

[-- Attachment #2: Type: application/pgp-signature, Size: 204 bytes --]

  parent reply	other threads:[~2009-02-11  1:43 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-10 18:49 [9fans] source browsing via http is back geoff
2009-02-10 19:02 ` Bruce Ellis
2009-02-10 21:10 ` John Barham
2009-02-10 21:15   ` ron minnich
2009-02-10 21:22     ` Nathaniel W Filardo
2009-02-10 21:32       ` erik quanstrom
2009-02-10 21:51         ` Roman V. Shaposhnik
2009-02-10 21:55           ` erik quanstrom
2009-02-10 22:05             ` Roman V. Shaposhnik
2009-02-10 22:13           ` Nathaniel W Filardo
2009-02-10 22:17             ` Roman V. Shaposhnik
2009-02-10 22:08         ` Nathaniel W Filardo
2009-02-10 22:10           ` erik quanstrom
2009-02-10 22:23             ` Roman V. Shaposhnik
2009-02-10 22:28               ` erik quanstrom
2009-02-10 22:45                 ` Roman V. Shaposhnik
2009-02-11  0:22                   ` Bruce Ellis
2009-02-11  0:28                     ` Roman V. Shaposhnik
2009-02-11  6:06                       ` Bruce Ellis
2009-02-11  0:32                     ` Akshat Kumar
2009-02-11  1:43                   ` Nathaniel W Filardo [this message]
2009-02-11  3:40                     ` [9fans] Plan 9 source history (was: Re: source browsing via http is back) erik quanstrom
2009-02-11 18:07                     ` Uriel
2009-02-11 18:19                       ` Venkatesh Srinivas
2009-02-11 18:35                         ` Roman V. Shaposhnik
2009-02-11 18:46                           ` Nathaniel W Filardo
2009-02-12 15:10                       ` Venkatesh Srinivas
2009-02-11 19:06                     ` Roman V. Shaposhnik
2009-02-12  5:57                 ` [9fans] source browsing via http is back sqweek
2009-02-12 12:49                   ` erik quanstrom
2009-02-12 13:10                     ` Bruce Ellis
2009-02-12 16:19                     ` Roman V. Shaposhnik
2009-02-12 16:28                       ` erik quanstrom
2009-02-12 16:42                     ` Nathaniel W Filardo
2009-02-12 16:50                       ` andrey mirtchovski
2009-02-12 16:56                         ` Nathaniel W Filardo
2009-02-12 16:58                         ` erik quanstrom
2009-02-12 17:20                         ` Bruce Ellis
2009-02-12 16:52                       ` erik quanstrom
2009-02-10 22:27       ` Nathaniel W Filardo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090211014302.GP22259@masters6.cs.jhu.edu \
    --to=nwf@cs.jhu.edu \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).